in my projects I adopt a semantic versioning scheme following the standard described by semver. I obtain something like this: product_v1.2.3-alpha-dirty.elf .
I work with embedded system and with make I usually generate a version_autogen.h file at compile time that contains both information of the version number , e.g 1.4.3.1, and current git repository, e.g --dirty, --clean and so on, using shell commands.
I'm starting to using meson and it is very easy and flexible but the custom commands like
run_command('command', 'arg1', 'arg2', 'arg3')
are available only at configure time while I need them at compile time to retrieve information like git status and similar.
How can I do that?
After a deeper research I found that custom_target() (as suggested by nielsdg) can do my job. I did something like this:
# versioning
version_autogen_h = custom_target(
'version_autogen.h',
output : 'version_autogen.h',
input : 'version_creator.sh',
command : ['#INPUT#', '0', '0', '1', 'alpha.1', '#OUTPUT#' ],
)
where version_creator.sh is my bash script that retrieves git info and creates the file version_autogen.h given the version numbers passed as the command arguments. The custom target is created at compile time, so my script is executed at compile time as well, exactly when I want it to be.
I also discovered that in meson there is the possibility to use generators to do something similar to this but in that case they transform an input file in one or more output file, so they didn't fit my case where I didn't need to have a file as input but just versioning numbers.
meson has specialized command for this job - vcs_tag
This command detects revision control commit information at build time
and places it in the specified output file. This file is guaranteed to
be up to date on every build. Keywords are similar to custom_target.
, so it'd look a bit shorter with possibility to avoid generation script and have just
git_version_h = vcs_tag(input : 'version.h.in',
output : 'version.h')
where version.h.in file that you should provide with #VCS_TAG# string that will be replaced, e.g.
#define MYPROJ_VERSION "#VCS_TAG#"
of course, you can have file header and naming according to your project style and possibly to add other definitions also. It is also possible to use another replace string and own command line to generate version, e.g.
vcs_tag(command: [
'git', '--git-dir', meson.build_root(),
'describe', '--tags', '--long',
'--match', '?.*.*', '--always'
],
...
)
which I found and adapted from here
Related
I have a Dataflow job which:
Reads a text file from GCS with other filenames in it
Passes the filenames to ReadAllFromParquet to read the .parquet files
Writes to BigQuery
Despite my job 'succeeding' it basically doesn't have an output collection past the ReadAllFromParquet step.
I successfully read the files in a list such as:['gs://my_bucket/my_file1.snappy.parquet','gs://my_bucket/my_file2.snappy.parquet','gs://my_bucket/my_file3.snappy.parquet']
I am also confirming this list is correct and the GCS paths to the files are correct using a logger on the step before ReadAllFromParquet.
That's what my pipeline looks like (omitting the full code for brevity but I am confident that it normally works as I have the exact same pipeline for .csv using ReadAllFromText and it works fine):
with beam.Pipeline(options=pipeline_options_batch) as pipeline_2:
try:
final_data = (
pipeline_2
|'Create empty PCollection' >> beam.Create([None])
|'Get accepted batch file: {}'.format(runtime_options.complete_batch) >> beam.ParDo(OutputValueProviderFn(runtime_options.complete_batch))
|'Read all filenames into a list'>> beam.ParDo(FileIterator(runtime_options.files_bucket))
|'Read all files' >> beam.io.ReadAllFromParquet(columns=['locationItemId','deviceId','timestamp'])
|'Process all files' >> beam.ParDo(ProcessSch2())
|'Transform to rows' >> beam.ParDo(BlisDictSch2())
|'Write to BigQuery' >> beam.io.WriteToBigQuery(
table = runtime_options.comp_table,
schema = SCHEMA_2,
project = pipeline_options_batch.view_as(GoogleCloudOptions).project, #options.display_data()['project'],
create_disposition = beam.io.BigQueryDisposition.CREATE_IF_NEEDED, #'CREATE_IF_NEEDED',#create if does not exist.
write_disposition = beam.io.BigQueryDisposition.WRITE_APPEND #'WRITE_APPEND' #add to existing rows,partitoning
)
)
except Exception as exception:
logging.error(exception)
pass
That's what my job diagram looks like after:
Does somebody have an idea what might be going wrong here and what's the best way to debug?
My ideas currently:
A bucket permissions issue. I noticed the bucket I am reading from is odd as earlier I couldn't download the files despite being a project Owner. The Owners of project only had 'Storage Legacy Bucket Owner'. I added 'Storage Admin' and it then worked fine when manually downloading files with my own account. As per the Dataflow documentation I have ensured that both the default compute service account as well as the dataflow one have 'Storage Admin' on this bucket. However, maybe that's all a red herring as ultimately if there was a permissions issue I should see this in the log and the job would fail?
ReadAllFromParquet expects the file patterns in a different format? I have showed an example of the list (in my diagram above I can see the input collection correctly shows elements added = 48 for 48 files in the list) I supply above. I know this format works for ReadAllFromText so I assumed that they are equivalent and should work.
=========
EDIT:
Noticed something else potentially consequential. Comparing against my other job which uses ReadAllFromText and works fine I noticed a slight mismatch in the naming that is worrying.
This is the name of the output collection for my working job:
And that's the name on my parquet job that doesn't actually read anything:
Note specifically
Read all files/ReadAllFiles/ReadRange.out0
vs
Read all files/Read all files/ReadRange.out0
The first part of the path is the name of my step for both jobs.
But I believe the second to be the ReadAllFiles class from apache_beam.io.filebasedsource (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filebasedsource.py) which both ReadAllFromText and ReadAllFromParquet call.
Seems like a potential bug but don't seem to be able to trace it in the source code.
=============
EDIT 2
After some more digging it seems that ReadAllFromParquet just isn't functional yet. ReadFromParquet calls apache_beam.io.parquetio._ParquetSource whereas ReadAllFromParquet simply calls
apache_beam.io.filebasedsource._ReadRange.
I wonder if there's a way to turn this on if it's an experimental function?
You didn't mentioned if you are using the last Beam SDK, try using SDK 2.16 to test the last changes.
The doc states that ReadAllFromParquet is an experimental funtion as well as ReadFromParquet; nonetheless, ReadFromParquet is reported as working in this thread Apache-Beam: Read parquet files from nested HDFS directories, you might want to try to using this funtion.
I am attempting to build a file in Puppet 5 using an ERB template. This ERB file uses class variables in the normal fashion, but is also constructed by inserting another Puppet-managed local file. However, I find that whenever I update the inserted file, it takes two Puppet runs to update the ERB-generated file. I want the updating to happen in one Puppet run.
It is easiest to see this with an example:
# test/manifests/init.pp
class test {
# This file will be inserted inside the next file:
file { '/tmp/partial.txt':
source => 'puppet:///modules/test/partial.txt',
before => File['/tmp/layers.txt'],
}
$inserted_file = file('/tmp/partial.txt')
# This file uses a template and has the above file inserted into it.
file { '/tmp/layers.txt':
content => template('test/layers.txt.erb')
}
}
Here is the template file:
# test/templates/layers.txt.erb
This is a file
<%= #inserted_file %>
If I make a change to the file test/files/partial.txt it takes two Puppet runs for the change to propagate to /tmp/layers.txt. For operational reasons it is important that the update happen in only one Puppet run.
I have tried using various dependencies (before, require, etc.) and even Puppet stages, but everything I tried still requires two Puppet runs.
While it is possible to achieve the same result using an exec resource with sed (or something similar), I would rather use a "pure" Puppet approach. Is this possible?
I am attempting to build a file in Puppet 5 using an ERB template. This ERB file uses class variables in the normal fashion, but is also constructed by inserting another Puppet-managed local file.
A Puppet run proceeds in three main phases:
Fact collection
Catalog building
Catalog application
Puppet manifests are completely evaluated during the catalog building phase, including evaluating all templates and function calls. Moreover, with a master / agent setup, catalog building happens on the master, so that's "the local system" during that phase. All target system modifications happen in the catalog application phase.
Thus your
$inserted_file = file('/tmp/partial.txt')
runs during catalog building, before File[/tmp/partial.txt] is applied. Since you give an absolute path to the file() function, it attempts to use the version already present on the catalog-building system, which is not necessarily even the machine for which the manifest is being built.
It's unclear to me why you want to install and manage the partial result in addition to the full templated file, but if indeed you do, then it seems to me that the best way to do so would be to feed both from the same source instead of trying to feed one from the other. To do this, you can make use of the file function's ability to load data from a file in the (any) module's files/ directory, similar to how File.source can do.
For example,
# test/manifests/init.pp
class test {
# Reads the contents of a module file:
$inserted_file = file('test/tmp/partial.txt')
file { '/tmp/partial.txt':
content => $inserted_file,
# resource relationship not necessary
}
file { '/tmp/layers.txt':
# interpolates $inserted_file:
content => template('test/layers.txt.erb')
}
}
Note also that the comments in your example manifest are misleading. Neither the file resource you present nor the contents of the file it manages are interpolated into your template, unless incidentally. What is interpolated is the value of the $inserted_file variable of the class that evaluates the template.
My PhpStorm 2017.2 project requires that each new file be created from a specific. In "Settings >> Editor >> File and Code Templates >> PHP File", I have the following template:
<?php
/**
* #author John Doe
* #copyright ${YEAR} Acme
* #created ${DATE}
* #modified ${DATE}
*/
This works well. PhpStorm fills in the year and date dynamically. However, when I later come back and make changes to the file, I always need to remember to change the #modified line manually. Is there a way to automate this so that onSave or onCommit (for version controlled file), the line is updated with the current value of ${DATE}?
Not possible ATM.
https://youtrack.jetbrains.com/issue/IDEABKL-7178 -- watch this ticket (star/vote/comment) to get notified on any progress. Right now there are no plans to implement something like that in nearest future.
On another hand (as mentioned in the comment in aforementioned ticket) -- see if standard "Copyright" plugin will be of any help (never used it myself so no idea of what exactly it can do).
One possible solution involves writing your own script/program (PHP or whatever other language you can use) that will parse your file (regex matching should do fine here -- no real need into going and parsing file into tokens) and update such info:
look at each line until the matching line will be found (some guard logic can be added to limit the number of lines to be parsed: if no matching line is found in first xx (e.g. 20) lines then assume that this file has no such comment/line);
update date/time part based on file modification timestamp.
Once you have such script -- just use File Watcher functionality so it gets called on each file modification.
Possible downside: File Watcher gets triggered when file modification is detected ... which may include changes made outside (e.g. another editor/download from remote host/another VCS branch/etc)). This may lead to unnecessary/unwanted updates.
If File Watcher functionality is not suitable for whatever reason -- look into grunt -watch or alike where you may easily disable watching (so your script will only be called when your watcher (build runner) is watching).
I'm using Knime 3.1.2 on OSX and Linux for OPENMS analysis (Mass Spectrometry).
Currently, it uses static filename.mzML files manually put in a directory. It usually has more than one file pressed in at a time ('Input FileS' module not 'Input File' module) using a ZipLoopStart.
I want these files to be downloaded dynamically and then pressed into the workflow...but I'm not sure the best way to do that.
Currently, I have a Python script that downloads .gz files (from AWS S3) and then unzips them. I already have variations that can unzip the files into memory using StringIO (and maybe pass them into the workflow from there as data??).
It can also download them to a directory...which maybe can them be used as the source? But I don't know how to tell the ZipLoop to wait and check the directory after the python script is run.
I also could have the python script run as a separate entity (outside of knime) and then, once the directory is populated, call knime...HOWEVER there will always be a different number of files (maybe 1, maybe three)...and I don't know how to make the 'Input Files' knime node to handle an unknown number of input files.
I hope this makes sense.
Thanks!
Thanks to Gábor for getting me on the right track. Although I ended up doing a slightly different route after much experimentation.
===
Being new to Knime, I don't know if this is an efficient use of Knime, or a complete Kluge...but it does work.
So, part of the problem is some of the Knime specific objects - One of which is called URIDataValue.
A Python Pandas dataframe is, apparently, interchangable with the Knime tables. However, I don't know if there's a way to import one of these URIDataValue objects into Python. So here's what I did...
1. I wrote a Python script that creates a Pandas Dataframe, and populates it with one Column. Everything is a string, including the column header:
from pandas import DataFrame
# Create empty table
T = DataFrame(
[
['file:///Users/.../copy/lfq_spikein_dilution_1.mzML'],
['file:///Users/.../copy/lfq_spikein_dilution_2.mzML'],
],
)
T.columns = ['URIDataValue']
#print T
output_table = T
That creates this dataframe:
Note: The column name and values are just strings. But it is (apparently) important that the column header be 'URIDataValue'...even though HERE it's just text. If the column name is not 'URIDataValue' the next node doesn't know what to do.
NEXT, the 'output_table' from the 'Python Source' node is patched to a 'String to URI' node, which (apparently and magically) knows to change the entire columns string values to URIDataValues (presumably based on the name of the first column...don't know that for sure).
Finally, the NEW table, with the correct data objects goes to a 'URI to PORT' node...since apparently 'Port' objects and a 'URI' object are different.
This, then, matches the needed input to the ZipLoop...which is normally the out put from a static (hard coded) 'Input Files' node.
Now, to actually solve the question above, I just have to add the code to my 'Python Source' to download and unzip the S3 files, then annotate the dataframe with their locations, and go.
I have no idea what I'm doing, but it worked.
There are multiple options to let things work:
Convert the files in-memory to a Binary Object cells using Python, later you can use that in KNIME. (This one, I am not sure is supported, but as I remember it was demoed in one of the last KNIME gatherings.)
Save the files to a temporary folder (Create Temp Dir) using Python and connect the Pyhon node using a flow variable connection to a file reader node in KNIME (which should work in a loop: List Files, check the Iterate List of Files metanode).
Maybe there is already S3 Remote File Handling support in KNIME, so you can do the downloading, unzipping within KNIME. (Not that I know of, but it would be nice.)
I would go with option 2, but I am not so familiar with Python, so for you, probably option 1 is the best. (In case option 3 is supported, that is the best in my opinion.)
I have a file containing some tests that should be run on Go 1.5+.
I am able to get the Go runtime version using runtime.Version() and doing various comparisons.
However, the test file imports golang.org/x/net/http2. The http2 package requires request.Cancel() from net/http, but that is only available on Go 1.5+.
That causes these errors in my CI environment causing the build to fail:
../../../golang.org/x/net/http2/transport.go:214: req.Cancel undefined (type *http.Request has no field or method Cancel)
../../../golang.org/x/net/http2/transport.go:218: req.Cancel undefined (type *http.Request has no field or method Cancel)
../../../golang.org/x/net/http2/transport.go:777: req.Cancel undefined (type *http.Request has no field or method Cancel)
I tried adding // +build go1.5 to the top of the file, but it didn't work.
Is there anyway I can limit a unit test file so that it is built and tested only on Go 1.5+ systems?
The build constraints is the proper way to do it.
But note that your error messages refer to the http2 package which was added in Go 1.6, so you need at least go1.6 build constraint.
The build constraint
// +build go1.5
Will cause the file to be compiled with Go 1.5 and onward. So if you want your test file to only compile and run with Go 1.6 and above, then use
// +build go1.6
Also don't forget that:
Constraints may appear in any kind of source file (not just Go), but they must appear near the top of the file, preceded only by blank lines and other line comments. These rules mean that in Go files a build constraint must appear before the package clause.
To distinguish build constraints from package documentation, a series of build constraints must be followed by a blank line.
A working example:
1 // +build go1.6
2
3 package yourpackage
Note that go1.17 onward supports a new build tag format that has a more explicit syntax and which go fmt is aware of: See https://pkg.go.dev/cmd/go#hdr-Build_constraints for full details.
//go:build go1.17
(don't forget the newline after the build constraint)