Taking full Input dataset when testing transformations in Palantir Foundry

Taking full Input dataset when testing transformations in Palantir Foundry - unit-testing

In Palantir Foundry, I could see that we can write unit tests using Pytest or TransformRunner. My understanding is that, with Pytest we cannot pass an output of transform for unit testing and in TransformRunner we cannot use the dataset that we have to use originally. We need some test data. But I would like to use the whole input dataset on which my transform should run in production and do run tests on the output of it. How can I achieve that?

You can't access foundry datasets from the CI, you'll need to have the data snippet in a file within your repo and then load it.
test/fixtures/data/input/a.csv
col_a,col_b
1,2
TEST_DATA_DIR = os.path.join(os.path.dirname(__file__), '..', '..', 'fixtures', 'data')
def test_runner_single_table(spark_session):
pipeline = Pipeline()
#transform_df(Output('/test_single_table/output/test'),
input_a=Input('/test_single_table/input/a'))
def transform_1(input_a):
return input_a.withColumn('col_c', input_a['col_a'] + input_a['col_b'])
pipeline.add_transforms(transform_1)
runner = TransformRunner(pipeline, '/test_single_table', TEST_DATA_DIR)
output = runner.build_dataset(spark_session, '/test_single_table/output/test')
assert output.first()['col_c'] == 3
TransformsRunner will translate the Input path into the directory path. In the example above:
TEST_DATA_DIR tells the runner where the data is in your environment
'/test_single_table' tells the runner what subpath can be ignored, since this path only exists on foundry datasets, not within your repo
input/a will be resolved against the Input('[ignored_sub_path]/input/a') and folder structure you defined in your repo.
You can print this properties and it will show up in the CI checks, if you want to understand them better.

Related

How to update ERB template with inserted file in one Puppet run?

I am attempting to build a file in Puppet 5 using an ERB template. This ERB file uses class variables in the normal fashion, but is also constructed by inserting another Puppet-managed local file. However, I find that whenever I update the inserted file, it takes two Puppet runs to update the ERB-generated file. I want the updating to happen in one Puppet run.
It is easiest to see this with an example:
# test/manifests/init.pp
class test {
# This file will be inserted inside the next file:
file { '/tmp/partial.txt':
source => 'puppet:///modules/test/partial.txt',
before => File['/tmp/layers.txt'],
}
$inserted_file = file('/tmp/partial.txt')
# This file uses a template and has the above file inserted into it.
file { '/tmp/layers.txt':
content => template('test/layers.txt.erb')
}
}
Here is the template file:
# test/templates/layers.txt.erb
This is a file
<%= #inserted_file %>
If I make a change to the file test/files/partial.txt it takes two Puppet runs for the change to propagate to /tmp/layers.txt. For operational reasons it is important that the update happen in only one Puppet run.
I have tried using various dependencies (before, require, etc.) and even Puppet stages, but everything I tried still requires two Puppet runs.
While it is possible to achieve the same result using an exec resource with sed (or something similar), I would rather use a "pure" Puppet approach. Is this possible?

I am attempting to build a file in Puppet 5 using an ERB template. This ERB file uses class variables in the normal fashion, but is also constructed by inserting another Puppet-managed local file.
A Puppet run proceeds in three main phases:
Fact collection
Catalog building
Catalog application
Puppet manifests are completely evaluated during the catalog building phase, including evaluating all templates and function calls. Moreover, with a master / agent setup, catalog building happens on the master, so that's "the local system" during that phase. All target system modifications happen in the catalog application phase.
Thus your
$inserted_file = file('/tmp/partial.txt')
runs during catalog building, before File[/tmp/partial.txt] is applied. Since you give an absolute path to the file() function, it attempts to use the version already present on the catalog-building system, which is not necessarily even the machine for which the manifest is being built.
It's unclear to me why you want to install and manage the partial result in addition to the full templated file, but if indeed you do, then it seems to me that the best way to do so would be to feed both from the same source instead of trying to feed one from the other. To do this, you can make use of the file function's ability to load data from a file in the (any) module's files/ directory, similar to how File.source can do.
For example,
# test/manifests/init.pp
class test {
# Reads the contents of a module file:
$inserted_file = file('test/tmp/partial.txt')
file { '/tmp/partial.txt':
content => $inserted_file,
# resource relationship not necessary
}
file { '/tmp/layers.txt':
# interpolates $inserted_file:
content => template('test/layers.txt.erb')
}
}
Note also that the comments in your example manifest are misleading. Neither the file resource you present nor the contents of the file it manages are interpolated into your template, unless incidentally. What is interpolated is the value of the $inserted_file variable of the class that evaluates the template.

How to install LaTeX class on Heroku?

I have a Django app hosted on Heroku. In it, I am using a view written in LaTeX to generate a pdf on-the-fly, and have installed the Heroku LaTeX buildpack to get this to work. My LaTeX view is below.
def pdf(request):
context = {}
template = get_template('cv/cv.tex')
rendered_tpl = template.render(context).encode('utf-8')
with tempfile.TemporaryDirectory() as tempdir:
process = Popen(
['pdflatex', '-output-directory', tempdir],
stdin=PIPE,
stdout=PIPE,
)
out, err = process.communicate(rendered_tpl)
with open(os.path.join(tempdir, 'texput.pdf'), 'rb') as f:
pdf = f.read()
r = HttpResponse(content_type='application/pdf')
r.write(pdf)
return r
This works fine when I use one of the existing document classes in cv.tex (eg. \documentclass{article}), but I would like to use a custom one, called res. Ordinarily I believe there are two options for using a custom class.
Place the class file (res.cls, in this case) in the same folder as the .tex file. For me, that would be in the templates folder of my app. I have tried this, but pdflatex cannot find the class file. (Presumably because it is not running in the templates folder, but in a temporary directory? Would there be a way to copy the class file to the temporary directory?)
Place the class file inside another folder with the structure localtexmf/tex/latex/res.cls, and make pdflatex aware of it using the method outlined in the answer to this question. I've tried running the CLI instructions on Heroku using heroku run bash, but it does not recognise initexmf, and I'm not entirely sure how to specify a relevant directory.
How can I tell pdflatex where to find to find the class file?

Just 2 ideas, I don't know if it'll solve your problems.
First, try to put your localtexmf folder in ~/texmf which is the default local folder in Linux systems (I don't know much about Heroku but it's mostly Linux systems, right?).
Second, instead of using initexmf, I usually use texhash, it may be available on your system?

I ended up finding another workaround to achieve my goal, but the most straightforward solution I found would be to change TEXMFHOME at runtime, for example...
TEXMFHOME=/d pdflatex <filename>.tex
...if you had /d/tex/latex/res/res.cls.
Credit goes to cfr on tex.stackexchange.com for the suggestion.

Depend on other files from Jenkins job dsl script

I have some large bash scripts in my job dsl files that I am declaring as
String script = '''
# large script
'''
and calling it from the shell method
shell(script)
How ever, I would like to break out the scripts into shell files. I tried declaring
String script = new File('script.sh').text
But the job that executes the jenkins job dsl script does not appear to find the file, in fact I am not sure which location it is even executing from.

Use readFileFromWorkspace to read the contents of a file from a job workspace.
The path is specified relative to the workspace root.
The second example on the linked API docs above is for a batch file, but replace batch with shell and you have the solution for your case.
def runScript = readFileFromWorkspace('script.sh')
job('example-2') {
steps {
shell(runScript)
}
}

how can I migrate issues from redmine to tuleap

Originally we use Redmine as issue management system, now we are planning to migrate to Tuleap system.
Both system have features to import/export issues into .csv file.
I want to know whether there is standard / simple way to migrate issues.
The main items inside issues are status, title and description.

What are "remaining_effort" and "cross_references" kind of data in remind ?

Since both system can export the csv file, which contains the item header that they needed, some header is different.
It needs scripts to map from one system to another system, code snippet is shown below.
It can work for other ALM system if they don't support from application (I mean migration).
#!/usr/bin/env python
import csv
import sys
# read sample tuleap csv header to avoid some field changes
tuleapcsvfile = open('tuleap.csv', 'rb')
reader = csv.DictReader(tuleapcsvfile)
to_del = ["remaining_effort","cross_references"]
# remove unneeded items
issueheader = [i for i in reader.fieldnames if not i in to_del]
# open stdout for output
w = csv.DictWriter(sys.stdout, fieldnames=issueheader,lineterminator="\n")
w.writeheader()
# read redmine csv files for converting
redminecsvfile = open('redmine.csv', 'rb')
redminereader = csv.DictReader(redminecsvfile)
for row in redminereader:
newrow = {}
if row['Status']=='New':
newrow['status'] = "Not Started"
# some simple one to one mapping
newrow['i_want_to' ]= row['Subject']
newrow['so_that'] = row['Description']
w.writerow(newrow)
some items in exported csv can't be imported back in tuleap like
remaining_effort,cross_references.
These two items are shown inside exported .csv file from tuleap issues.

Had the same issue and the csv solution looked too limited to me:
the field matching between tracker and csv content must fit exactly
you can't import attachments
you can't link artifacts
...
Issues can be extracted from Redmine using REST API or by directly reading the SQL database. Artifacts can be created in Tuleap using the REST API. You "just" need a script in the middle to extract issues from Redmine and then import them into Tuleap.
I created such a script in Python:
It has a plugin approach so that it could import issues/bugs from any bug tracker and later save them to any other bug tracker.
For now it only support extracting issues from Redmine SQL database and export to Tuleap using REST API.
One can extend it (new plugin) to extract issues from other trackers (bugzilla/mantis/gitlab).
One can extend it (new plugin) to generate a Tuleap xml file rather than importing the artifacts using Tuleap REST API (XML being more powerful here).
I ported hundreds of issues from Redmine to Tuleap using this and it was good enough for my needs.
Have a look at https://github.com/jpo38/TrackerIO.

How to make unit test in pyCharm

I want to make some unit tests, so I set up a list with values that all should be asserted true, just like this question. But I want it to run in PyCharm (With pressing Alt+Shift+F10)
If I just use the code from the answers, I just get No tests were found

You need to double check the settings for the tests run configuration:
By default PyCharm will inspect files that start with test and that are subclasses of unittest.TestCase, however you can control the Pattern and the subclasses option.
Change Pattern according to your test file names, it accepts Python regular expression.

Note that PyCharm will inspect only classes that inherit from unittest.TestCase so you should write the tests inside a class inherited from unittest.TestCase

PyCharm 2019.1+ and pytest
First, create a file named pytest.ini in order to set up custom configurations. For example, by default pytest will consider any file matching with test_*.py and *_test.py globs as a test module, so I encourage to have this file in order to define your custom file name patterns.
pytest.ini
[pytest]
python_files = test_*.py tests.py *_test.py
Now, open up the Run/Configuration window:
Add a new configuration, select Python tests and pytest:
In the following window you choose a name for your configuration, and you can also choose the target, but if you want pytest to use the pytest.ini file do not select Script path, APPLY, and OK.
Finally, run the test by clicking on the Play button.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Taking full Input dataset when testing transformations in Palantir Foundry - unit-testing

Related

How to update ERB template with inserted file in one Puppet run?

How to install LaTeX class on Heroku?

Depend on other files from Jenkins job dsl script

how can I migrate issues from redmine to tuleap

How to make unit test in pyCharm

Categories

Resources