How do i merge 2 AFP's created from Apache FOP - xslt

How do i merge large number of separate AFP files created using apache FOP into a single AFP file?
Any suggestions for tools is also welcome.

Related

Django / Docker, manage a million images and many large files

The project I am working on relies on many static files. I am looking for guidance on how to deal with the situation. First I will explain the situation, then I will ask my questions.
The Situation
The files that need management:
Roughly 1.5 million .bmp images, about 100 GB
Roughly 100 .h5 files, 250 MB each, about 25 GB
bmp files
The images are part of an image library, the user can filter trough them based on multiple different kinds of meta data. The meta data is spread out over multiple Models such as: Printer, PaperType, and Source.
In development the images sit in the static folder of the Django project, this works fine for now.
h5 files
Each app has its own set of .h5 files. They are used to inspect user generated images. Results of this inspection are stored in the database, the image itself is stored on disc.
Moving to production
Now that you know a bit about the problem it is time to ask my questions.
Please note that I have never pushed a Django project to production before. I am also new to Docker.
Docker
The project needs to be deployed on multiple machines, to make this more easy I decided to use Docker. I managed to build the Image and run the Container without the .bmp and .h5 files. So far so good!
How do I deal with the .h5 files? It does not seem like a good idea to build an Image that is 25 GB in size. Is there a way to download the .h5 file at a later point in time? As in building a Docker Image that only contains the code and downloads the .h5 files later.
Image files
I'm pretty sure that Django's collectstatic command is not meant for moving the amount of images the project uses. I'm thinking along the lines of directly uploading the images to some kind of image server.
If there are specialized image servers I would love to hear your suggestions.

Generate informatica mappings to save development effort

I have 100 files on amazon s3 in various folders under 5 different buckets which need to load in snowflake 100 different target tables. I can import physical source and target objects, build pipelines manually which will take long time for me.
Is there is any alternate way to generate 100 mappings using some script or command to save development effort.
I am using Informatica developer 10.2 version.

How to serve content that can't be served via nginx or apache by Django?

I need to serve some content that should be preprocessed before getting served. There are huge volume of files (500,000 files with sizes around 1GB for example.) My server is written in Django. I can do the preprocess in python (a Django view for example) but I can't do it in Nginx or other static file servers. Is there anyway to implement a Django view that serves these files efficiently with random access? Is there any modules for this purpose? What should I take care of to implement it?
P.S. The files are not saved anywhere, there are around 2000 files and all other files can get generated by these 2000 files in the python code (Django view). I don't wanna buy that much hard disk to preprocess and store all 500,000 final files.

How to use Talend to read/write SAS files without installing a SAS server?

I'd like to use Talend to manipulate SAS files; however, the SAS plugins require some sort of server authentication. I don't have a SAS server on my machine and I'd like not to install one if possible. Is there a way to read/write SAS files without installing the server?
Not easily! The simplest way would be to purchase a copy of BASE SAS and use it as a local server - see thread below.
How can I read a SAS dataset?
A cheaper way would be to purchase a licensed version of the WPS product:
http://en.wikipedia.org/wiki/World_Programming_System
A free, but less reliable way would be to use an open source reader, such as:
http://kasper.eobjects.org/2011/06/sassyreader-open-source-reader-of-sas.html
hope this helps.

Django nonrel on Google Appengine 3000 file limit

I followed the directions on http://www.allbuttonspressed.com/projects/djangoappengine, but afterwards realized that the resulting project has almost 5,000 files, namely because of the django directory.
Am I not supposed to include django 1.3 and just use django 1.2 builtin with Google App Engine? Or am I missing something? I've heard that zipimport is not a good idea with django.
There are not a lot of solutions:
You can try to remove every lib unnecessary in the directory of Django
Use zipimport if you don't want use django 1.2 provides with
GAE. Reduce the number of files you use in your project.
But note that: For yours instances, load a lot of files is slower because there are a lot of reads in the file system. django.zip is reads only one time and stocked in the memory to unzip it. there is just one read on the file system not 3000 and more...