I have an application that loads different documents to the server, and allows users to read documents' content.
I am uploading the documents to the server, and then I try to read the courses by id, like:
def view_course(request,id):
u = Courses.objects.get(pk=id)
etc
But I don't find anywhere: how can I actually read the content of a /.doc/.pdf/.txt and display it on a web page?
Reading plain text files is trivial, while PDF and Word processing is not. For the latter two you'll have to incorporate some external libraries.
Text: f.read()
Word: extracting text from MS word files in python
PDF: http://www.unixuser.org/~euske/python/pdfminer/index.html
Related
Is there a convention for where to put miscellaneous files which are opened and read by Django views, but which aren't python code or html templates and aren't processed by the temmplate renderer?
(In my case, I have a View which returns an application/excel .xlsx file to the user. It is generated by reading in a "template" xlsx file with lots of tricky formatting and formulae but no data values. Openpyxl is used to insert selected data from the db, and the resulting workbook is saved and sent to the user.)
I would say this might be the media root dir, which itself is called "media", respectively some subdir of it. So in your app this is a subdirectory at <app-dir>/media or maybe /media/xlsx/.
See also here in the documentation: https://docs.djangoproject.com/en/4.1/topics/files/#file-storage
I have an html page that I want to convert to PDF in several copies but in the same document. But when I do this it's the last line of my file that generate then I wanted to have a PDF document so the number of pages is the size of my database.
I was have the same problem 2 months ago .. and after trying several libraries to handle this after all i realized that chrome browser is best tool to convert html to pdf >> so i made
1 - special view to return the original html.
2 - use chrome from cli to convert this page to pdf and save it like :
os.system( f'chromium --headless --disable-gpu --print-to-pdf=<location to save>.pdf --no-margins <your_view_url>')
3 - then go to this place and return it as normal file.
i have a view in my django project that should be able to read the content of an uploaded .txt file from the input type="file", but the thing is that with arabic content it doesn't print the actual text, but a series of codes "\xd9\x88\xd9\x82\xd8\xa7\xd9\x84" and i couldn't find any solution for this since the file is perfectly viewable on my pc and my website it the one exporting that file in "utf-8". any help here ?
Uploaded_File = request.FILES["Doc"]
for chunk in Uploaded_File.chunks(chunk_size=None):
print(chunk)
I have a few html pages, each with a number of posts that follow a given pattern and that contain a lot of different information, among others a well-identified url and an associated name and date. I would like to produce a table containing date + name + url in separate columns and ignore the rest of the text in the document (both data and html formatting).
I was thinking of using OpenOffice and its regex functions to do so but I don’t see how I would do the actual extraction from html to a table (I am familiar with search and replace but am not sure there is a way to do extraction; Jan Dvorak’s third comments to the question on How to extract file name from random image <img> tags in Open Office speaks against it).
Is there a good way to do this text extraction, in OpenOffice or with any other tool?
Is there a good way to do this text extraction, in OpenOffice or with
any other tool?
Since you're parsing HTML, it would be easier to use an HTML parsing engine. For example in PHP you could pull all the links or all the images from a page with a few simple lines.
// Create DOM from URL or file
$html = file_get_html('path and file name');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
This could be further refined if you had some additional information about the values being pulled and how they are stored in the file.
Here is what I want to accomplish:
I am writing a script which will parse some source code, extract some comments that I want it to extract and I will store this text in a text file.
I want to write another script that uses the content of this text file to be programatically transformed into a Confluence wiki-page.
Please tell me the best way to do this. I already saw this
I felt that I could change the input in the above example to take input from text file and update contents of Confluence page. But, I am not sure how it will be formatted. If I have to format it, what do I need to do?
Thanks in advance!
As an alternative to XML-RPC you can use the integrated WebDAV plugin.
Write a script that creates a directory in the selected space.
The directory name will be the page name. After creating the directory a text-file with the same name (with .txt extension) will be created in the directory which holds the content of the page
let your script edit this file in insert the content of your text-file.
Information about the usage of the plugin:
Configuring a WebDAV client for Confluence
Confluence WebDAV Plugin
Troubleshooting WebDAV
I Also saw the code you are referring to. I tweaked it a bit a produced the following code that me help you for the second part of your question.
My code creates a new space and also a new page from a text file that is inserted in the previously created space.
import sys
import xmlrpc.client
import os
import re
# Connects to confluence server with username and password
site_URL = "YOUR_URL"
server = xmlrpc.client.ServerProxy(site_URL + "/rpc/xmlrpc")
username = "YOUR_USERNAME"
pwd = "YOUR_PASSWORD"
token = server.confluence2.login(username, pwd)
# The space you want to add a page to
spacekey = "YOUR_SPACENAME"
# Retrives text from a file
f = open('FileName.txt', 'r')
content = f.read()
f.close()
# Creates a new page to insert in the new space from text file content
newpage = {"title":"NEW_PAGENAME", "space":spacekey, "content":content}
server.confluence2.storePage(token, newpage)
server.confluence2.logout(token)
For your formatting issues, html is supported, but I have not quite figured out how to use CSS styles other than inline (everthing else does not seem to work).
These styles work when you write them with the HTML macro inside Confluence, but from the remote API, it does not seem to behave the same way.
UPDATE :
You can use the {html} macro to include your html by using :
content = server.confluence2.convertWikiToStorageFormat(token, content)
You can also specify your CSS in your global CSS stylesheet.
Another option is to develop a plugin to include a CSS resource :
https://developer.atlassian.com/display/CONFDEV/Creating+a+Stylesheet+Theme
I have tried this way and works petty well for me:
I used a Java program to created a programmatic client to create the dynamic content. This created a HTML document out of my plain text.
I used a RPC client connected to Confluence to uploaded it as a new page.
Your html will be preserved. But if you want to add CSS or JS/JQuery etc on top of your html you will need to create a Macro and enable it for the Particular page. This feature is not available in Confluence OnDemand.