Merge PDF files containing Optional Content - Ruby 2.2.4 - Rails 4 - ruby-on-rails-4

In a Rails 4 project I have been tasked with merging an “about” page to the end of PDF documents that have been uploaded via Paperclip.
The issue is that some of the uploaded pdfs contain optional content. I started out using combine_pdf but it does not support files with optional content as explained here. I have tried Prawn but it no longer support this functionality. I finally found the PDF Toolkit gem but the documentation does not say anything about optional content support. PDF Toolkit is a command line tool that has been wrapped in a Gem and therefore operates outside of the Rails application. I have tried using these command line examples in my CLI: pdftk file1.pdf file2.pdf cat output out_file.pdf but the terminal just hangs indefinitely.
The gem's documentation here is very unclear (to me) and may be the source of my issues.
My hope is to find advice on how to accomplish this using PDF Toolkit OR a better library for merging PDFs with optional content.
I have gotten this far by researching Stack Overflow questions like this and this
My OS is OSX 10.11.6
existing_pdf_path = #report.document.file.path
# Create a html template and convert it to pdf
about_company_html = render_to_string("_about_company.html.erb", layout: false)
about_company_pdf = WickedPdf.new.pdf_from_string(about_company_html, orientation: 'Landscape')
# Save about_company_pdf to file in tmp/pdf
about_company_pdf_path = Rails.root.join('tmp/pdf', 'about_company_partial.pdf').to_s
File.open(about_company_pdf_path, 'wb') { |file| file << about_company_pdf }
# Create and save a blank target file we will save everything to
combined_pdf_path = Rails.root.join('tmp/pdf', 'combined.pdf').to_s
FileUtils.touch(combined_pdf_path)
# This returns false OR just hangs depending on my exact syntax, no error or backtrace
result = PDF::Toolkit.pdftk( *%w(existing_pdf_path about_company_pdf_path cat output combined_pdf_path) )
I have tried as many variations of the above call to pdftk as I can find or think of. For example PDF::Toolkit.pdftk( existing_pdf_path, about_lux_pdf_path, 'cat', 'output', combined_pdf_path ) with no results.

Related

Extracting Outlook Attachment from Saved Email

I have an analysis project that is requiring me to extract the 'current state' of a PDF that houses our report that is sent out 4 times daily. I have the code written to scrape my PDF but I need to figure out how to extract the PDF from the email so I can step through it with my code.
I tried using the code below
import win32com.client
import os
location = r'C:\Users\myusername\OneDrive - companyinfo\Department Projects\TestEmails'
files = [f for f in os.listdir(location)]
print(files)
for file in files:
if file.endswith('.msg'):
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
msg = outlook.OpenSharedItem(file)
att = msg.Attachments
for i in att:
i.SaveAsFil`e(os.path.join(r'C:\Users\username\OneDrive - companyname\Department Projects\TestPDF', i.FileName))
The error it produces is:
pywintypes.com_error: (-2147352567, 'Exception occurred.', (4096, u'Microsoft Outlook', u"We can't open 'Stats Report.msg'. It's possible the file is already open, or you don't have permission to open it.\n\nTo check your permissions, right-click the file folder, then click Properties.", None, 0, -2147287038), None)
I am only currently testing with one saved test.msg file but I have over 1400 I need to parse through. Maybe this isn't the best technique as I know VBA could do something similar within outlook, but I don't have much skills in the VBA region.
I have outlook 2016 installed on Windows 7 computer running python 2.7. Is this error something easy to fix? Is there a better technique to take an attached PDF and save it to a folder so my other program can grab the necessary data?
Desired output: PDF Attachment is Extracted and Saved into a separate folder.
Thank you for your help and expertise,
Andy
So I figured out the answer and how simple and stupid it was makes me unreasonably frustrated.....
My working directory was wrong even though I grabbed the file, the file name was the only item created.
I created a true_location variable that gave it the true full working directory and it worked like a charm.
true_location = location + '\\' + file
Enter that in the for loop under the if clause and it works like a charm.
Best,
Andy

How to install LaTeX class on Heroku?

I have a Django app hosted on Heroku. In it, I am using a view written in LaTeX to generate a pdf on-the-fly, and have installed the Heroku LaTeX buildpack to get this to work. My LaTeX view is below.
def pdf(request):
context = {}
template = get_template('cv/cv.tex')
rendered_tpl = template.render(context).encode('utf-8')
with tempfile.TemporaryDirectory() as tempdir:
process = Popen(
['pdflatex', '-output-directory', tempdir],
stdin=PIPE,
stdout=PIPE,
)
out, err = process.communicate(rendered_tpl)
with open(os.path.join(tempdir, 'texput.pdf'), 'rb') as f:
pdf = f.read()
r = HttpResponse(content_type='application/pdf')
r.write(pdf)
return r
This works fine when I use one of the existing document classes in cv.tex (eg. \documentclass{article}), but I would like to use a custom one, called res. Ordinarily I believe there are two options for using a custom class.
Place the class file (res.cls, in this case) in the same folder as the .tex file. For me, that would be in the templates folder of my app. I have tried this, but pdflatex cannot find the class file. (Presumably because it is not running in the templates folder, but in a temporary directory? Would there be a way to copy the class file to the temporary directory?)
Place the class file inside another folder with the structure localtexmf/tex/latex/res.cls, and make pdflatex aware of it using the method outlined in the answer to this question. I've tried running the CLI instructions on Heroku using heroku run bash, but it does not recognise initexmf, and I'm not entirely sure how to specify a relevant directory.
How can I tell pdflatex where to find to find the class file?
Just 2 ideas, I don't know if it'll solve your problems.
First, try to put your localtexmf folder in ~/texmf which is the default local folder in Linux systems (I don't know much about Heroku but it's mostly Linux systems, right?).
Second, instead of using initexmf, I usually use texhash, it may be available on your system?
I ended up finding another workaround to achieve my goal, but the most straightforward solution I found would be to change TEXMFHOME at runtime, for example...
TEXMFHOME=/d pdflatex <filename>.tex
...if you had /d/tex/latex/res/res.cls.
Credit goes to cfr on tex.stackexchange.com for the suggestion.

How do I create text file for Confluence and then import it as a Confluence wiki page?

Here is what I want to accomplish:
I am writing a script which will parse some source code, extract some comments that I want it to extract and I will store this text in a text file.
I want to write another script that uses the content of this text file to be programatically transformed into a Confluence wiki-page.
Please tell me the best way to do this. I already saw this
I felt that I could change the input in the above example to take input from text file and update contents of Confluence page. But, I am not sure how it will be formatted. If I have to format it, what do I need to do?
Thanks in advance!
As an alternative to XML-RPC you can use the integrated WebDAV plugin.
Write a script that creates a directory in the selected space.
The directory name will be the page name. After creating the directory a text-file with the same name (with .txt extension) will be created in the directory which holds the content of the page
let your script edit this file in insert the content of your text-file.
Information about the usage of the plugin:
Configuring a WebDAV client for Confluence
Confluence WebDAV Plugin
Troubleshooting WebDAV
I Also saw the code you are referring to. I tweaked it a bit a produced the following code that me help you for the second part of your question.
My code creates a new space and also a new page from a text file that is inserted in the previously created space.
import sys
import xmlrpc.client
import os
import re
# Connects to confluence server with username and password
site_URL = "YOUR_URL"
server = xmlrpc.client.ServerProxy(site_URL + "/rpc/xmlrpc")
username = "YOUR_USERNAME"
pwd = "YOUR_PASSWORD"
token = server.confluence2.login(username, pwd)
# The space you want to add a page to
spacekey = "YOUR_SPACENAME"
# Retrives text from a file
f = open('FileName.txt', 'r')
content = f.read()
f.close()
# Creates a new page to insert in the new space from text file content
newpage = {"title":"NEW_PAGENAME", "space":spacekey, "content":content}
server.confluence2.storePage(token, newpage)
server.confluence2.logout(token)
For your formatting issues, html is supported, but I have not quite figured out how to use CSS styles other than inline (everthing else does not seem to work).
These styles work when you write them with the HTML macro inside Confluence, but from the remote API, it does not seem to behave the same way.
UPDATE :
You can use the {html} macro to include your html by using :
content = server.confluence2.convertWikiToStorageFormat(token, content)
You can also specify your CSS in your global CSS stylesheet.
Another option is to develop a plugin to include a CSS resource :
https://developer.atlassian.com/display/CONFDEV/Creating+a+Stylesheet+Theme
I have tried this way and works petty well for me:
I used a Java program to created a programmatic client to create the dynamic content. This created a HTML document out of my plain text.
I used a RPC client connected to Confluence to uploaded it as a new page.
Your html will be preserved. But if you want to add CSS or JS/JQuery etc on top of your html you will need to create a Macro and enable it for the Particular page. This feature is not available in Confluence OnDemand.

how can I migrate issues from redmine to tuleap

Originally we use Redmine as issue management system, now we are planning to migrate to Tuleap system.
Both system have features to import/export issues into .csv file.
I want to know whether there is standard / simple way to migrate issues.
The main items inside issues are status, title and description.
What are "remaining_effort" and "cross_references" kind of data in remind ?
Since both system can export the csv file, which contains the item header that they needed, some header is different.
It needs scripts to map from one system to another system, code snippet is shown below.
It can work for other ALM system if they don't support from application (I mean migration).
#!/usr/bin/env python
import csv
import sys
# read sample tuleap csv header to avoid some field changes
tuleapcsvfile = open('tuleap.csv', 'rb')
reader = csv.DictReader(tuleapcsvfile)
to_del = ["remaining_effort","cross_references"]
# remove unneeded items
issueheader = [i for i in reader.fieldnames if not i in to_del]
# open stdout for output
w = csv.DictWriter(sys.stdout, fieldnames=issueheader,lineterminator="\n")
w.writeheader()
# read redmine csv files for converting
redminecsvfile = open('redmine.csv', 'rb')
redminereader = csv.DictReader(redminecsvfile)
for row in redminereader:
newrow = {}
if row['Status']=='New':
newrow['status'] = "Not Started"
# some simple one to one mapping
newrow['i_want_to' ]= row['Subject']
newrow['so_that'] = row['Description']
w.writerow(newrow)
some items in exported csv can't be imported back in tuleap like
remaining_effort,cross_references.
These two items are shown inside exported .csv file from tuleap issues.
Had the same issue and the csv solution looked too limited to me:
the field matching between tracker and csv content must fit exactly
you can't import attachments
you can't link artifacts
...
Issues can be extracted from Redmine using REST API or by directly reading the SQL database. Artifacts can be created in Tuleap using the REST API. You "just" need a script in the middle to extract issues from Redmine and then import them into Tuleap.
I created such a script in Python:
It has a plugin approach so that it could import issues/bugs from any bug tracker and later save them to any other bug tracker.
For now it only support extracting issues from Redmine SQL database and export to Tuleap using REST API.
One can extend it (new plugin) to extract issues from other trackers (bugzilla/mantis/gitlab).
One can extend it (new plugin) to generate a Tuleap xml file rather than importing the artifacts using Tuleap REST API (XML being more powerful here).
I ported hundreds of issues from Redmine to Tuleap using this and it was good enough for my needs.
Have a look at https://github.com/jpo38/TrackerIO.

How do I change TOC location in wkhtmltopdf output?

I need the TOC to be on the 3rd page. Apparently, if there is a way to control this, it has to be through the XSL stylesheet. All my search attempts didn't give me a clue though. Is it possible at all?
I can't use cover option since I need the header to show on both first and second pages.
OK I've found a way to do it using wkhtmltopdf 0.11. I've extracted the first two pages into a separate HTML document and then run it like this
wkhtmltopdf [options] page cover.html toc --xsl-style-sheet ... input_file.html out.pdf
Unfortunately it took a lot more effort than I expected, since I'm using it in a Rails application through wicked_pdf, and it doesn't play nice with the new options format, so I had to fork it and make the necessary changes as well.
The command line generated by wicked_pdf looks like this (long paths omitted):
"c:/program files (x86)/wkhtmltopdf/wkhtmltopdf.exe"
--header-html "file:///path/to/header" --footer-html "file:///path/to/footer"
--margin-top 20 --margin-bottom 15 --margin-left 5 --margin-right 40 page "file:///path/to/cover/page" --disable-javascript toc --xsl-style-sheet "path/to/style/sheet" - -