I have some large bash scripts in my job dsl files that I am declaring as
String script = '''
# large script
and calling it from the shell method
How ever, I would like to break out the scripts into shell files. I tried declaring
String script = new File('script.sh').text
But the job that executes the jenkins job dsl script does not appear to find the file, in fact I am not sure which location it is even executing from.
Use readFileFromWorkspace to read the contents of a file from a job workspace.
The path is specified relative to the workspace root.
The second example on the linked API docs above is for a batch file, but replace batch with shell and you have the solution for your case.
def runScript = readFileFromWorkspace('script.sh')
job('example-2') {
steps {
in my projects I adopt a semantic versioning scheme following the standard described by semver. I obtain something like this: product_v1.2.3-alpha-dirty.elf .
I work with embedded system and with make I usually generate a version_autogen.h file at compile time that contains both information of the version number , e.g, and current git repository, e.g --dirty, --clean and so on, using shell commands.
I'm starting to using meson and it is very easy and flexible but the custom commands like
run_command('command', 'arg1', 'arg2', 'arg3')
are available only at configure time while I need them at compile time to retrieve information like git status and similar.
How can I do that?
After a deeper research I found that custom_target() (as suggested by nielsdg) can do my job. I did something like this:
# versioning
version_autogen_h = custom_target(
output : 'version_autogen.h',
input : 'version_creator.sh',
command : ['#INPUT#', '0', '0', '1', 'alpha.1', '#OUTPUT#' ],
where version_creator.sh is my bash script that retrieves git info and creates the file version_autogen.h given the version numbers passed as the command arguments. The custom target is created at compile time, so my script is executed at compile time as well, exactly when I want it to be.
I also discovered that in meson there is the possibility to use generators to do something similar to this but in that case they transform an input file in one or more output file, so they didn't fit my case where I didn't need to have a file as input but just versioning numbers.
meson has specialized command for this job - vcs_tag
This command detects revision control commit information at build time
and places it in the specified output file. This file is guaranteed to
be up to date on every build. Keywords are similar to custom_target.
, so it'd look a bit shorter with possibility to avoid generation script and have just
git_version_h = vcs_tag(input : 'version.h.in',
output : 'version.h')
where version.h.in file that you should provide with #VCS_TAG# string that will be replaced, e.g.
of course, you can have file header and naming according to your project style and possibly to add other definitions also. It is also possible to use another replace string and own command line to generate version, e.g.
vcs_tag(command: [
'git', '--git-dir', meson.build_root(),
'describe', '--tags', '--long',
'--match', '?.*.*', '--always'
which I found and adapted from here
I have a Dataflow job which:
Reads a text file from GCS with other filenames in it
Passes the filenames to ReadAllFromParquet to read the .parquet files
Writes to BigQuery
Despite my job 'succeeding' it basically doesn't have an output collection past the ReadAllFromParquet step.
I successfully read the files in a list such as:['gs://my_bucket/my_file1.snappy.parquet','gs://my_bucket/my_file2.snappy.parquet','gs://my_bucket/my_file3.snappy.parquet']
I am also confirming this list is correct and the GCS paths to the files are correct using a logger on the step before ReadAllFromParquet.
That's what my pipeline looks like (omitting the full code for brevity but I am confident that it normally works as I have the exact same pipeline for .csv using ReadAllFromText and it works fine):
with beam.Pipeline(options=pipeline_options_batch) as pipeline_2:
final_data = (
|'Create empty PCollection' >> beam.Create([None])
|'Get accepted batch file: {}'.format(runtime_options.complete_batch) >> beam.ParDo(OutputValueProviderFn(runtime_options.complete_batch))
|'Read all filenames into a list'>> beam.ParDo(FileIterator(runtime_options.files_bucket))
|'Read all files' >> beam.io.ReadAllFromParquet(columns=['locationItemId','deviceId','timestamp'])
|'Process all files' >> beam.ParDo(ProcessSch2())
|'Transform to rows' >> beam.ParDo(BlisDictSch2())
|'Write to BigQuery' >> beam.io.WriteToBigQuery(
table = runtime_options.comp_table,
schema = SCHEMA_2,
project = pipeline_options_batch.view_as(GoogleCloudOptions).project, #options.display_data()['project'],
create_disposition = beam.io.BigQueryDisposition.CREATE_IF_NEEDED, #'CREATE_IF_NEEDED',#create if does not exist.
write_disposition = beam.io.BigQueryDisposition.WRITE_APPEND #'WRITE_APPEND' #add to existing rows,partitoning
except Exception as exception:
That's what my job diagram looks like after:
Does somebody have an idea what might be going wrong here and what's the best way to debug?
My ideas currently:
A bucket permissions issue. I noticed the bucket I am reading from is odd as earlier I couldn't download the files despite being a project Owner. The Owners of project only had 'Storage Legacy Bucket Owner'. I added 'Storage Admin' and it then worked fine when manually downloading files with my own account. As per the Dataflow documentation I have ensured that both the default compute service account as well as the dataflow one have 'Storage Admin' on this bucket. However, maybe that's all a red herring as ultimately if there was a permissions issue I should see this in the log and the job would fail?
ReadAllFromParquet expects the file patterns in a different format? I have showed an example of the list (in my diagram above I can see the input collection correctly shows elements added = 48 for 48 files in the list) I supply above. I know this format works for ReadAllFromText so I assumed that they are equivalent and should work.
Noticed something else potentially consequential. Comparing against my other job which uses ReadAllFromText and works fine I noticed a slight mismatch in the naming that is worrying.
This is the name of the output collection for my working job:
And that's the name on my parquet job that doesn't actually read anything:
Note specifically
Read all files/ReadAllFiles/ReadRange.out0
Read all files/Read all files/ReadRange.out0
The first part of the path is the name of my step for both jobs.
But I believe the second to be the ReadAllFiles class from apache_beam.io.filebasedsource (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filebasedsource.py) which both ReadAllFromText and ReadAllFromParquet call.
Seems like a potential bug but don't seem to be able to trace it in the source code.
After some more digging it seems that ReadAllFromParquet just isn't functional yet. ReadFromParquet calls apache_beam.io.parquetio._ParquetSource whereas ReadAllFromParquet simply calls
I wonder if there's a way to turn this on if it's an experimental function?
You didn't mentioned if you are using the last Beam SDK, try using SDK 2.16 to test the last changes.
The doc states that ReadAllFromParquet is an experimental funtion as well as ReadFromParquet; nonetheless, ReadFromParquet is reported as working in this thread Apache-Beam: Read parquet files from nested HDFS directories, you might want to try to using this funtion.
I am attempting to build a file in Puppet 5 using an ERB template. This ERB file uses class variables in the normal fashion, but is also constructed by inserting another Puppet-managed local file. However, I find that whenever I update the inserted file, it takes two Puppet runs to update the ERB-generated file. I want the updating to happen in one Puppet run.
It is easiest to see this with an example:
# test/manifests/init.pp
class test {
# This file will be inserted inside the next file:
file { '/tmp/partial.txt':
source => 'puppet:///modules/test/partial.txt',
before => File['/tmp/layers.txt'],
$inserted_file = file('/tmp/partial.txt')
# This file uses a template and has the above file inserted into it.
file { '/tmp/layers.txt':
content => template('test/layers.txt.erb')
Here is the template file:
# test/templates/layers.txt.erb
This is a file
<%= #inserted_file %>
If I make a change to the file test/files/partial.txt it takes two Puppet runs for the change to propagate to /tmp/layers.txt. For operational reasons it is important that the update happen in only one Puppet run.
I have tried using various dependencies (before, require, etc.) and even Puppet stages, but everything I tried still requires two Puppet runs.
While it is possible to achieve the same result using an exec resource with sed (or something similar), I would rather use a "pure" Puppet approach. Is this possible?
I am attempting to build a file in Puppet 5 using an ERB template. This ERB file uses class variables in the normal fashion, but is also constructed by inserting another Puppet-managed local file.
A Puppet run proceeds in three main phases:
Fact collection
Catalog building
Catalog application
Puppet manifests are completely evaluated during the catalog building phase, including evaluating all templates and function calls. Moreover, with a master / agent setup, catalog building happens on the master, so that's "the local system" during that phase. All target system modifications happen in the catalog application phase.
Thus your
$inserted_file = file('/tmp/partial.txt')
runs during catalog building, before File[/tmp/partial.txt] is applied. Since you give an absolute path to the file() function, it attempts to use the version already present on the catalog-building system, which is not necessarily even the machine for which the manifest is being built.
It's unclear to me why you want to install and manage the partial result in addition to the full templated file, but if indeed you do, then it seems to me that the best way to do so would be to feed both from the same source instead of trying to feed one from the other. To do this, you can make use of the file function's ability to load data from a file in the (any) module's files/ directory, similar to how File.source can do.
For example,
# test/manifests/init.pp
class test {
# Reads the contents of a module file:
$inserted_file = file('test/tmp/partial.txt')
file { '/tmp/partial.txt':
content => $inserted_file,
# resource relationship not necessary
file { '/tmp/layers.txt':
# interpolates $inserted_file:
content => template('test/layers.txt.erb')
Note also that the comments in your example manifest are misleading. Neither the file resource you present nor the contents of the file it manages are interpolated into your template, unless incidentally. What is interpolated is the value of the $inserted_file variable of the class that evaluates the template.
I have a Django app hosted on Heroku. In it, I am using a view written in LaTeX to generate a pdf on-the-fly, and have installed the Heroku LaTeX buildpack to get this to work. My LaTeX view is below.
def pdf(request):
context = {}
template = get_template('cv/cv.tex')
rendered_tpl = template.render(context).encode('utf-8')
with tempfile.TemporaryDirectory() as tempdir:
process = Popen(
['pdflatex', '-output-directory', tempdir],
out, err = process.communicate(rendered_tpl)
with open(os.path.join(tempdir, 'texput.pdf'), 'rb') as f:
pdf = f.read()
r = HttpResponse(content_type='application/pdf')
return r
This works fine when I use one of the existing document classes in cv.tex (eg. \documentclass{article}), but I would like to use a custom one, called res. Ordinarily I believe there are two options for using a custom class.
Place the class file (res.cls, in this case) in the same folder as the .tex file. For me, that would be in the templates folder of my app. I have tried this, but pdflatex cannot find the class file. (Presumably because it is not running in the templates folder, but in a temporary directory? Would there be a way to copy the class file to the temporary directory?)
Place the class file inside another folder with the structure localtexmf/tex/latex/res.cls, and make pdflatex aware of it using the method outlined in the answer to this question. I've tried running the CLI instructions on Heroku using heroku run bash, but it does not recognise initexmf, and I'm not entirely sure how to specify a relevant directory.
How can I tell pdflatex where to find to find the class file?
Just 2 ideas, I don't know if it'll solve your problems.
First, try to put your localtexmf folder in ~/texmf which is the default local folder in Linux systems (I don't know much about Heroku but it's mostly Linux systems, right?).
Second, instead of using initexmf, I usually use texhash, it may be available on your system?
I ended up finding another workaround to achieve my goal, but the most straightforward solution I found would be to change TEXMFHOME at runtime, for example...
TEXMFHOME=/d pdflatex <filename>.tex
...if you had /d/tex/latex/res/res.cls.
Credit goes to cfr on tex.stackexchange.com for the suggestion.
I have a talend job which is simple like below:
ts3Connection -> ts3Get -> tfileinputDelimeted -> tmap -> tamazonmysqloutput.
Now the scenario here is that some times I get the file in .txt format and sometimes I get it in a zip file.
So I want to use tFileUnarchive to unzip the file if it's in zip or process it bypassing the tFileUnarchive component if the file is in unzipped format i.e only in .txt format.
Any help on this is greatly appreciated.
The trick here is to break the file retrieval and potential unzipping into one sub job and then the processing of the files into another sub job afterwards.
Here's a simple example job:
As normal, you connect to S3 and then you might list all the relevant objects in the bucket using the tS3List and then pass this to tS3Get. Alternatively you might have another way of passing the relevant object key that you want to download to tS3Get.
In the above job I set tS3Get up to fetch every object that is iterated on by the tS3List component by setting the key as:
and then downloading it to:
"C:/Talend/5.6.1/studio/workspace/S3_downloads/" + ((String)globalMap.get("tS3List_1_CURRENT_KEY"))
The extra bit I've added starts with a Run If conditional link from the tS3Get which links the tFileUnarchive with the condition:
Which checks to see if the file being downloaded from S3 is a .zip file.
The tFileUnarchive component then just needs to be told what to unzip, which will be the file we've just downloaded:
"C:/Talend/5.6.1/studio/workspace/S3_downloads/" + ((String)globalMap.get("tS3List_1_CURRENT_KEY"))
and where to extract it to:
This then puts any extracted files in the same place as the ones that didn't need extracting.
From here we can now iterate through the downloads folder looking for the file types we want by setting the directory to "C:/Talend/5.6.1/studio/workspace/S3_downloads" and the global expression to "*.csv" in my case as I wanted to read in only the CSV files (including the zipped ones) I had in S3.
Finally, we then read the delimited files by setting the file to be read by the tFileInputDelimited component as:
And in my case I simply then printed this to the console but obviously you would then want to perform some transformation before uploading to your AWS RDS instance.