I have an html page that I want to convert to PDF in several copies but in the same document. But when I do this it's the last line of my file that generate then I wanted to have a PDF document so the number of pages is the size of my database.
I was have the same problem 2 months ago .. and after trying several libraries to handle this after all i realized that chrome browser is best tool to convert html to pdf >> so i made
1 - special view to return the original html.
2 - use chrome from cli to convert this page to pdf and save it like :
os.system( f'chromium --headless --disable-gpu --print-to-pdf=<location to save>.pdf --no-margins <your_view_url>')
3 - then go to this place and return it as normal file.
Related
I have a few html pages, each with a number of posts that follow a given pattern and that contain a lot of different information, among others a well-identified url and an associated name and date. I would like to produce a table containing date + name + url in separate columns and ignore the rest of the text in the document (both data and html formatting).
I was thinking of using OpenOffice and its regex functions to do so but I don’t see how I would do the actual extraction from html to a table (I am familiar with search and replace but am not sure there is a way to do extraction; Jan Dvorak’s third comments to the question on How to extract file name from random image <img> tags in Open Office speaks against it).
Is there a good way to do this text extraction, in OpenOffice or with any other tool?
Is there a good way to do this text extraction, in OpenOffice or with
any other tool?
Since you're parsing HTML, it would be easier to use an HTML parsing engine. For example in PHP you could pull all the links or all the images from a page with a few simple lines.
// Create DOM from URL or file
$html = file_get_html('path and file name');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
This could be further refined if you had some additional information about the values being pulled and how they are stored in the file.
Here is what I want to accomplish:
I am writing a script which will parse some source code, extract some comments that I want it to extract and I will store this text in a text file.
I want to write another script that uses the content of this text file to be programatically transformed into a Confluence wiki-page.
Please tell me the best way to do this. I already saw this
I felt that I could change the input in the above example to take input from text file and update contents of Confluence page. But, I am not sure how it will be formatted. If I have to format it, what do I need to do?
Thanks in advance!
As an alternative to XML-RPC you can use the integrated WebDAV plugin.
Write a script that creates a directory in the selected space.
The directory name will be the page name. After creating the directory a text-file with the same name (with .txt extension) will be created in the directory which holds the content of the page
let your script edit this file in insert the content of your text-file.
Information about the usage of the plugin:
Configuring a WebDAV client for Confluence
Confluence WebDAV Plugin
Troubleshooting WebDAV
I Also saw the code you are referring to. I tweaked it a bit a produced the following code that me help you for the second part of your question.
My code creates a new space and also a new page from a text file that is inserted in the previously created space.
import sys
import xmlrpc.client
import os
import re
# Connects to confluence server with username and password
site_URL = "YOUR_URL"
server = xmlrpc.client.ServerProxy(site_URL + "/rpc/xmlrpc")
username = "YOUR_USERNAME"
pwd = "YOUR_PASSWORD"
token = server.confluence2.login(username, pwd)
# The space you want to add a page to
spacekey = "YOUR_SPACENAME"
# Retrives text from a file
f = open('FileName.txt', 'r')
content = f.read()
f.close()
# Creates a new page to insert in the new space from text file content
newpage = {"title":"NEW_PAGENAME", "space":spacekey, "content":content}
server.confluence2.storePage(token, newpage)
server.confluence2.logout(token)
For your formatting issues, html is supported, but I have not quite figured out how to use CSS styles other than inline (everthing else does not seem to work).
These styles work when you write them with the HTML macro inside Confluence, but from the remote API, it does not seem to behave the same way.
UPDATE :
You can use the {html} macro to include your html by using :
content = server.confluence2.convertWikiToStorageFormat(token, content)
You can also specify your CSS in your global CSS stylesheet.
Another option is to develop a plugin to include a CSS resource :
https://developer.atlassian.com/display/CONFDEV/Creating+a+Stylesheet+Theme
I have tried this way and works petty well for me:
I used a Java program to created a programmatic client to create the dynamic content. This created a HTML document out of my plain text.
I used a RPC client connected to Confluence to uploaded it as a new page.
Your html will be preserved. But if you want to add CSS or JS/JQuery etc on top of your html you will need to create a Macro and enable it for the Particular page. This feature is not available in Confluence OnDemand.
i working on one project. i want to read file which path from url,this file containing xml data i have to show this data in chart format.
Basically, your steps may be these:
Validate the URL data (StructKeyExists + FileExists + isFile).
Read and parse XML file, you can do this with XmlParse.
Convert XML object into the query (see query functions).
Render the data using great charting tags.
If you want more detailed help -- please expand your question, to make it more specific.
I'm writing a Django app to serve some documentation written in RestructuredText.
I have many documents written in *.rst, each of them is quite long with many section, subsection and so on.
Display the whole document in a single page is not a problem using Django filters, but I'd rather have just the topic index on a first page, whit links to an URL where I can display a single section / subsection (which will need some 'previous | up | home | next' link I guess...). In a way similar to a 'multiple HTML page output' as in a docbook / XML to HTML conversion.
Can anyone point me to some direction to build a document tree of a *.rst document an parse a single section of it, or suggest a clever way to obtain a similar result?
Choice 1. Include URL links to the other parts of the document.
You write an index.rst, part1.rst, part2.rst, etc. And your index.rst has links to the other parts. This requires almost no work, except careful planning to make sure that your RST HTML links are correct.
There's no "parse". You just break your document into sections. Manually.
[This seems so obvious, I'm afraid to mention it.]
Choice 2. Use Sphinx. It manages table-of-contents and inter-document connections very nicely.
However, the Sphinx extensions to RST aren't handled directly by Django, so you'd need to save the Sphinx output and then display that in Django. We use the JSON HTML Builder (http://sphinx.pocoo.org/builders.html?highlight=json#sphinx.builders.html.JSONHTMLBuilder) output from Sphinx. Then we render these documents through a template.
I have an application that loads different documents to the server, and allows users to read documents' content.
I am uploading the documents to the server, and then I try to read the courses by id, like:
def view_course(request,id):
u = Courses.objects.get(pk=id)
etc
But I don't find anywhere: how can I actually read the content of a /.doc/.pdf/.txt and display it on a web page?
Reading plain text files is trivial, while PDF and Word processing is not. For the latter two you'll have to incorporate some external libraries.
Text: f.read()
Word: extracting text from MS word files in python
PDF: http://www.unixuser.org/~euske/python/pdfminer/index.html