Refer file from previouse directory in a job/transformation in kettle

Refer file from previouse directory in a job/transformation in kettle - kettle

I encounter a situation in a Job/transformation in the kettle (PDI) where I need to refer a job/transformation which is in the directory which is above the current directory.
Eg: The directory structure is like this. /home/ubuntu/mainETL/Jobs/trans.
For the job existed in mainETL, all the paths of jobs/trans are referred as ${Internal.Job.Filename.Directory}/Jobs/testjob.kjb. for transformation, ${Internal.Job.Filename.Directory}/Jobs/trans/testtrans.ktr.
But I encounter a situation where, for the job existed in mainETL refer a job/trans existed in /home/ubuntu directory.
Someone pls suggest the solution. Thank you/

You can use
${Internal.Job.Filename.Directory}/..
Note: if you’re using Pentaho 8 or above you should use the variable
${Internal.Entry.Current.Directory}
As the other one is now deprecated.

Related

Methods for opening a specific file inside the project WITHOUT knowing what the working directory will be

I've had trouble with this issue across many languages, most recently with C++.
The Issue Exemplified
Let's say we're working with C++ and have the following file structure for a project:
("Project" main folder with three [modules, data, etc] subfolders)
Now say:
Our maincode.cpp is in the Project folder
moduleA.cpp is in modules folder
data.txt is in data folder
moduleA.cpp wants to read data.txt
So the way I'd currently do it would be to assume maincode.cpp gets compiled & executed inside the Project folder, and so hardcode the path data/data.txt in moduleA.cpp to do the reading (say I used fstream fs("data/data.txt") to do so).
But what if the code was, for some reason, executed inside etc folder?
Is there a way around this?
The Questions
Is this a valid question? Or am I missing something with the wd (working directory) concept fundamentals?
Are there any methods for working around absolute paths so as to solve this issue in C++?
Are there any universal methods for doing the same with any language?
If there are no reasonable methods, how would you approach this issue?
Please leave a comment if I missed any important details with the problem's illustration!

At some point the program has to make an assumption where the file(s) are. Either by getting it from user input or a relative path with the presumed filename. As already said in the comments, C++ recently got std::filesystem added in C++17 which can help you making cross-platform code that interacts with the hosts' filesystem.
That being said, every program, big or small, has to make certain assumptions at some point, deleting or moving certain files is problematic for any program in case the program requires them to be at a certain location under a certain name. This is not solvable other than presenting the user with an error message etc.

As #Hatted Rooster said, it's not generally solvable for some arbitrary file without making some assumptions, however there are frameworks that allow you to "store" some files in the resources embedded into the executable (or otherwise). Those frameworks would usually allow your to handle such files in a opaque way, without the need to rely on a current working dir or relative paths.
For example, see the Qt Resource System.

Your program can deduce the path from argv[0] in the main call, if you know that it is always relative to your executable or you use an absolute path like "C:\myProgram\data\data.txt".
The second approach works in every language.

What is the consequence of using my own autoexec.sas file?

I want to write my own autoexec.sas file, but I don't want to lose any functionality that I might have had from the default autoexec.sas file(s).
When I do a SASHome directory search, I find many files with this name. Do all of these files execute by default after the SAS system initializes? Or just one of the files?
Where do I save my own autoexec.sas file so that the other files still execute, along with my own?
Just starting here. Thank you very much.

You really shouldn't edit the usermods file. When you upgrade versions you'll lose all of your changes. There's actually a specific autoexec file designed just for this purpose -- autoexec_usermods.sas
That's where you'll want to keep your site specific modifications -- this way when you upgrade versions your transition will be relatively seamless.
I think you will want to edit these 3 usermods files:
opt-biserver-Lev1-SASMeta
opt-biserver-Lev1-SASMeta-MetadataServer
opt-biserver-Lev1-SASMeta-WorkspaceServer

The following links from SAS documentation should help answer your question on impact, storage and order of execution of autoexec file :
Customizing Your SAS Session by Using Configuration and Autoexec Files
Files Used by SAS -> SAS Autoexec File

is any risk/notes to write output to different folders using mapreduce

The goal is to write output to different folders(different path) using one reduce.
I use old mapreduce api, and I do a little modification on MultipleOutputs(loose the restriction), and it works.
But the outputformat I use extends FileOutputFormat, where FileOutputCommitter is refered by FileOutputFormat.
And I find there will be a _success file in only one folder. it will be a problem?
And there still a empty file part-00000, I don't know why it is generated?

_SUCCESS is written only once after the job is complete. It is useful to check if the job is complete. I dont think there is any risk with that. You should know that it is created only after the job is complete and you should know where to look for that file if you are using it.
Regarding the part- files, take a look at
map reduce output files: part-r-* and part-*

How to enable a shared object accessing a data file in runtime (UNIX)

I have a class method (implemented in a shared object in UNIX environment) which needs to access a text data file in runtime (using ifstream). Currently the method assumes that the data file is available for opening without any relative path, i.e something like
ifstream dataFile("data.txt");
The shared object is loaded from python code, and in order for it to be available for loading, it is being copied to the \usr\lib\ folder as a post-build step of the makefile. My question is how to make the text data file available for the shared object. I have considered the following possibilities:
Use some relative path, but that method is not totally fool-proof (the project is hosted on various instances and I cannot be sure the directory tree will stay the same (e.g) a month from now).
copy the data file as well to \usr\lib, but I feel this is a wrong attitude.
Any suggestions are welcomed.

The proper way to go about this is to make the location of the text file a configurable value that will be set when your project is installed. Using a configuration file in /etc/ is a common way to store that value.
That way you can put the text file in e.g. /usr/share/ with all the machine-independent files (that data file is machine-independent, right?) and your code would "know" where to find it.
Note that if the data file is going to be modified as part of your code's operation, then it should probably be placed somewhere under /var (/var/lib or perhaps /var/cache) according to the Filesystem Hierarchy Standard (FHS) and most other Unix filesystem standards.
If the data file could be considered a configuration file, as you mentioned in one of your comments, you could just hard-code its path to somewhere under /etc/ (e.g. /etc/MyProject/data.cfg) and go on.

I can think of two solutions :
When you load your shared object, you somehow give it the path to your file.
Instead of copying the file to /usr/lib you could create a symbolic link do it in /usr/lib but that is not the best thing to do imho.
The first solution is the best one for me.

Whereto put "plugins" in linux

I am currently developing/hacking an image analyzing/transforming tool.
The filters therein will be loaded at runtime using dlopen&co.
My question is where do *nix tools usually put plugins (*.so files) when installed?
bin/program
lib/program/plugins/thisandthat.so
maybe?
Secondly how do I use it and where do I put it during development without installing it. (this is probably the tricky part)
I want to avoid shell-scripts if possible.
thanks in regard
Ronny

Usually /usr/lib/programmname should be a good spot
During development I'd create a command line paramter to specify the plugin search path and just leave the plugins in the build-dir for example.

Consider:
/usr/lib/program/*.so

A good guide for choosing is Filesystem Hierarchy Standard.
Most Linux distribuitions use this standard.
Here is a very short summary.
Place application binary in:
/usr/bin/progname, /usr/local/bin/progname or /opt/progname
Place plugins or library files in:
/usr/lib/progname, /usr/local/lib/progname or /opt/progname/lib
Place host configuration for the application in:
/etc/progname or /etc/opt/progname
Place user configuration in:
$HOME/.progname
Place application manual page in:
/usr/shar/man/man1/
There is separate hierachy for /var. As an example use /var/log/progname for logging.
In responce to caf's comment. I find it very usefull to choose target directory at compile time. Using a $PREFIX also makes it easy to separate devellopment build's from shippment.
Most use /usr/progname, /usr/lib/progname and /etc/progname

Do not forget:
$HOME/.program/

The layout seems sensible. You can, for instance, look in current directory, look up environment variable or command line switch during development. It depends on the details of your development environment and workflow.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Refer file from previouse directory in a job/transformation in kettle - kettle

You can use ${Internal.Job.Filename.Directory}/.. Note: if you’re using Pentaho 8 or above you should use the variable ${Internal.Entry.Current.Directory} As the other one is now deprecated.

Related

Methods for opening a specific file inside the project WITHOUT knowing what the working directory will be

What is the consequence of using my own autoexec.sas file?

is any risk/notes to write output to different folders using mapreduce

How to enable a shared object accessing a data file in runtime (UNIX)

Whereto put "plugins" in linux

Categories

Resources