How to validate generic options in MapReduce? - mapreduce

I am passing a configuration properties file for a mapreduce program as hadoop jar myprogram.jar -conf config-props.xml
Within my run method, a Job object is created as below:
Configuration conf = new Configuration();
// I want to validate that one configuration properties file is passed here
Job job = new Job(conf, getClass().getSimpleName());
While this works fine, I want to add code before creating a Job object to validate that I am sending one configuration properties file. Looking for help how is it done preferably using a GenericOptionsParser.

GenericOptionsParser is a class that interprets common Hadoop command-line options and sets them on a Configuration object for your application to use as desired. You don’t usually use GenericOptionsParser directly, as it’s more convenient to implement the Tool interface and run your application with the ToolRunner, which uses GenericOptionsParser internally.
Check this link for more details on using GOP.Link to GOP

Related

Boost.Log Configuration Files

I'm adding logging to an old C++ program. After some research, I've decided to use Boost Log . The documentation is filled with examples of creating sinks and filters. However, I couldn't find any example of a log configuration file.
Is there a way to configure logging from a file that doesn't have to be compiled? Similar to what log4net has? Or Python (well, since Python isn't compiled, anyway...) ?
Eventually I found the official documentation, either it was added recently, or it is well hidden so that I didn't see it before:
http://www.boost.org/doc/libs/1_57_0/libs/log/doc/html/log/detailed/utilities.html#log.detailed.utilities.setup.settings_file
Unfortunately, I can't find an exhaustive answer neither, but some observations:
Certainly it is possible to use a configuration file:
boost::log::init_from_stream(std::basic_istream< CharT > &)
Example of the file (from Boost log severity_logger init_from_stream):
[Sinks.MySink]
Destination=Console
Format="%LineID%: <%Severity%> - %Message%"
From the following link you can identify additional valid setting keys and values (e.g. Destination=TextFile, Filter=, AutoFlush=, FileName=)
http://boost.2283326.n4.nabble.com/log-init-from-settings-problem-with-applying-format-and-filter-td3643483.html
Constants in boost's parser_utils.hpp give another idea of keywords that are by default supported by the configuration file (E.g. section [Core] with key DisableLogging).
Providing settings for user defined types is described here (with a corresponding snippet of the configuration file at the end of the page):
http://www.boost.org/doc/libs/1_57_0/libs/log/doc/html/log/extension/settings.html
It seems to me that it is difficult to find a description of the configuration file format entries because the valid entries are derived from the source code implementing the sinks, filters etc. This implementation may be even user defined so it is impossible to give explicit configuration format description.
Maybe you can try to create your configuration in a programmatic way and when transforming it to the form of the configuration file, you can open separate questions for the particular properties that you are not able find out how to set them.

Assign Global Variable/Argument for Any Build to Use

I have several (15 or so) builds which all reference the same string of text in their respective build process templates. Every 90 days that text expires and needs to be updated in each of the templates. Is there a way to create a central variable or argument
One solution would be to create an environment variable on your build machine. Then reference the variable in all of your builds. When you needed to update the value you would only have to set it in one place.
How to: Use Environment Variables in a Build
If you have more than one build machine then it could become too much of a maintenance issue.
Another solution would involve using MSBuild response files. You create an .rsp file that holds the property value and the value would be picked up and set from MSBuild via the command line.
You need to place it into somewhere where all your builds can access it, then customize your build process template to read from there (build definitions - as you know - do not have a mechanism to share data between defs).
Some examples would be a file checked into TFS, a file in a known location (file share), web page, web service, etc.
You could even make a custom activity that knew how to read it and output the result as an OutArgument (e.g. Custom activity that read the string from a hardcoded URL).

Shell script generation command based on template

What I want to achieve
User would provide a command which would do remote execution. Command (protocol for remote execution) can be SSH/RSH... etc. So I want it to be part of a configuration file or a template file (assume parameters are fixed across protocol) like below sample -
template.cfg file (as configured by user):
ssh $ip $commandList
I would generate a list of values in another data file which would contain the ip address and the command list. Like
10.182.215.214|echo $UNAME
10.251.142.142|echo $SHELLNAME
I would like to have a script call it driver.sh which when executed, generates the actual script/scripts with the command from template to another execution script - execute.sh
Questions
How can I generate the script based on template/plugin (which can take liberty and provide the command)?
If the data is generated in an online application (C/C++), other than normal file based operation (read from the cfg file and update the execute.sh) is there any better way?
1.
while IFS=\| read ip commandList
do eval echo $(<template.cfg)
done <data >execute.sh
You may want to quote the variable expansions in the data file.
2.
Since you want the user-provided command to be part of a configuration file, I see no other way than to read from the cfg file; on the other hand, you may well directly execute the generated commands instead of writing them to an execute.sh.
This almost looks as if you're trying to re-implement automated configuration tools like puppet or chef. Beats the ssh loop.
Puppet contains a module called facter, which is used to report/collect all kinds of data about your remote systems.
All of these tools require some setup (public/private keypairs, software installation).
They both have the advantage of builtin logging - good for audits.

Gradle project with only configuration and no sources

I'd like to create a new Gradle project without any sources. I'm going to put there some configuration files and I want to generate a zip file when I build.
With maven I'd use the assembly plugin. I'm looking for the easiest and lightest way to do this with Gradle. I wonder if I need to apply the java plugin even if I don't have any sources here, just because it provides some basic and useful tasks like clean, assemble and so on. Generating a zip is pretty straightforward, I know how to do that, but I don't know where and how to put the zip generation within the gradle world.
I've done it manually until now. In other words, for projects where all I want to do is create some kind of distro and I need the basic lifecycle tasks like assemble and clean, I've simply created those tasks along with the needed dependencies.
But there is the 'base' plugin (mentioned under "Base plugins" of the "Standard Gradle Plugins" in the user's guide) that seems to fit the bill nicely for this functionality. Note though that the user guide mentions that this and the other base plugins are not yet considered part of the Gradle API and are not really documented.
The results are pretty much identical to yours, the only difference being that there are no confusing java specific tasks that always remain UP-TO-DATE.
apply plugin: 'base'
task dist(type: Zip) {
from('solr')
into('solr')
}
assemble.dependsOn(dist)
Sample run:
$ gradle clean assemble
:clean
:dist
:assemble
BUILD SUCCESSFUL
Total time: 2.562 secs
As far as I understood, it might sound strange but looks like I need to apply the java plugin in order to create a zip file. Furthermore it's handy to have available some common tasks like for example clean. The following is my build.gradle:
apply plugin: 'java'
task('dist', type: Zip) {
from('solr')
into('solr')
}
assemble.dependsOn dist
I applied the java plugin and defined my dist task which creates a zip file containing a solr directory with the content of the solr directory within my project. The last line is handy to have the task executed when I run the common gradle build or gradle assemble, since I don't want to explicitly call the dist task.
This way if I work with multiple projects I just need to execute gradle build on the parent to generate all the artifacts, including the configuration zip.
Please let me know if you have better solutions and add your own answer!
You could just use the groovy plugin and use ant. I did something like this. I do also like javanna's answer.
task jars(dependsOn: ['dev_jars']) << {
def fromDir = file('/database-files/non_dev').listFiles().sort()
File dist = new File("${project.buildDir}/dist")
dist.mkdir()
fromDir.each { File dir ->
File destFile = new File("${dist.absolutePath}" + "/" + "database-connection-" + dir.name + ".jar")
println destFile.getAbsolutePath()
ant.jar(destfile:destFile, update:false, baseDir:dir)
}
}

Referencing information in builds specified in a run parameter [Hudson]

Day 1 with using Hudson for our CI build. Slowly but surely getting up to speed.
My question is about run parameters. I've seen that I can use them to reference a particular run of a particular project - that's all fine.
What I don't understand (and can't find any documentation on - there's nothing at Parameterized Build) is how I refer to anything in the run defined by the run parameter.
Essentially I want to reference the %BUILD_NUMBER% and %SVN_REVISION% of the run that is selected in the run parameter.
How can I do that?
Do you really need to add extra property values, extra parameters for your job?
Since BUILD_NUMBER and SVN_REVISION are already defined as environment variables (see Building a software project), you can use those in your job.
When a Hudson job executes, it sets some environment variables that you may use in your shell script, batch command, or Ant script
or:
illustrates you already have those values at your disposal.
You can then use them to define other environment variables/properties within your shell or ant script.
When it comes to pass a variable value from one job to another, the Parameterized Trigger Plugin should do the trick:
The parameters section can contain a combination of one or more of the following:
a set of predefined properties
properties from a properties file read from the workspace of the triggering build
the parameters of the current build
"Subversion revision": makes sure the triggered projects are built with the same revision(s) of the triggering build.
You still have to make sure those projects are actually configured to checkout the right Subversion URLs.
Note: there might be an issue with the Join Plugin, which might not work when the Parameterized Trigger is in action.