How can we check the output of BashOperator in Airflow? - airflow-scheduler

I'm very new to Airflow, I wonder if I execute a bash operator, how can we get the console output of that Operator? I'm wondering does setting xcom_push = true solve the problem?
I'd very glad of anyone can answer this :)

If xcom_push is True, only the last line written to stdout will also be pushed to an XCom when the bash command completes.
Related source code: https://github.com/apache/airflow/blob/1.10.11/airflow/operators/bash_operator.py#L167-L168

To add to #kaxil's answer (as I don't have enough reputation to comment on his answer):
If your bash_command returns multiple lines that you want to include in xcom then add this to return everything on one line:
bash_command="<COMMAND> | tr '\n' '||'",

For those using Airflow 2+, BashOperator now returns the entire output (source), not just the last line and does not require specifying do_xcom_push (new name in 2+ instead of xcom_push), as it is true by default.

Related

How to stop Prettier from removing ending newline

This is a weird one as everything I have read seems to indicated the opposite, that Prettier will always add a newline and there's nothing the user can do about this.
However, somehow my install of Prettier does the opposite. I save a file with a new line ending and exit. Running yarn prettier 'src/**' --write shows that this new line then gets removed in my file.
I'm not running any sort of automatic code formatting tools, so this isn't my editor or anything. I just use a plain Vim install.
This is really frustrating as I do need those new lines but have to avoid using Prettier as it removes them for some reason.
I guess it doesn't remove the newlines, even though I can't find a single tool that actually displays it properly.
I am seeing the $ as mentioned in this comment which seems to mean it's working.

How to replace extension in bash regex?

I'm trying to write a bash script to calculate some biological stuff. I have a problem with regex, bash is a little unfamiliar to me yet. Unfortenly I have no time to learn it that fast coz I need immanently results.
So my files:
RV30.afa
resFilesRV30.fasta
RV30213.afa
resFilesRV30213.fasta
RV30441.afa
resFilesRV30441.fasta
...
Command i have to use:
mscore -cftit RV30.afa resFilesRV30.fasta >FinalRV30.txt'
What i have now:
#!/bin/bash
parallel 'mscore -cftit {} resFile{}.fasta >final{.}.txt' ::: RV*.afa
The problem is:
resFile{}.fasta = is trying to open file like this: resFileXXX.afa.fasta i need to skip extension in second argument (.afa) and ovewritte it by ".fasta".
I didn't find a piece of good advice on google for my problem (or i can't reforge it to my script yet), and my time to get results already ends. So i will be grateful for help in solving this problem. On its basis, I will be able to solve some of the others that appeared in my other scripts
This should work for you by substituting {} with {.} in 2nd argument:
parallel 'mscore -cftit {} resFiles{.}.fasta >final{.}.txt' ::: RV*.afa
As you're using already in your command, the replacement string {.} removes the extension.
Does this not work?
for afafile in *.afa; do number="${afafile#file}"; number="${number%.afa}"; ./mscore -cftit "$afafile" "resFile${number}.fasta" > "file${number}final.txt"; done

Python 2.7.15 adds a new line in Windows to the end but not on Linux. How do I fix this?

This is not a typical "how do I strip a new line or spaces" question...
I am new to python in general. But I am aware of
print ("test", end="")
print ("test", end="")
for Python 3 and
print "test",
print "test",
for Python 2
Python 3 implementation will print correctly on both Linux and Windows machines; however the Python 2 implementation will add an extra line at the end of the execution on Windows based machines (but not Linux, there it prints correctly). Is there any way to get rid of this new line?
I have searched around and I cant seem to find anyone talking about this particular issue. Here is a screenshot for demonstration:
So, in accordance with the print documentation
Standard output is defined as the file object named stdout
And we probably assume that python I\O are system dependent, so that's how we could try to guess the explanation of this situation, even thought print documentation states:
A '\n' character is written at the end, unless the print statement
ends with a comma.
OR The reason is that Windows & Linux threat print statement differently (since print is a statement in Python 2, and a function call in Python 3).
Back to the question, how to get rid of this line:
I used future statement for print function:
from __future__ import print_function
print('test', end=' ')
print('test', end='')
If I find any reasonable explanation, I will update the answer (should be somewhere !).
After speaking to a few people about this and doing some research. It appears the most straightforward way around this issue is to directly use:
sys.stdout.write()
instead of print. You can then format your output similar to the way C/C++ and Java work.

How to Set Custom Delimiter in PIG

What is the correct syntax to set a custom TextInputFormat delimiter in Pig? I've tried several variations on the following but its treating it as string values instead of Carriage Return Line Feed.
set textinputformat.record.delimiter '\r\n';
Pig Version is 0.12.0-cdh5.9.0 and Hadoop Version is 2.6.0-cdh5.9.0
Not ideal but a workaround:
Create a properties file like myprops.properties which contains the following line: textinputformat.record.delimiter=\r\n
Then run your script like: pig -P ~/myprops.properties -f path/to/pigscript.pig
It looks like this is a known issue as mentioned in the fourth paragraph of the fourth comment of: PIG_4572
Here is the syntax
SET textinputformat.record.delimiter '<delimiter>';
This works for me

Delete all lines upto some regex match

I want to delete everything from start of the document upto some regex match, such as _tmm. I wrote the following custom command:
command! FilterTmm exe 'g/^_tmm\\>/,/^$/mo$' | norm /_tmm<CR> | :0,-1 d
This doesn't work as expected. But when I execute these commands directly using the command line, they work.
Do you have any alternative suggestions to accomplish this job using custom commands?
It seems that you want to remove from beginning to the line above the matched line.
/pattern could have offset option. like /pattern/{offset}, :h / for detail, for your needs, you could do (no matter where your cursor is):
ggd/_tmm/-1<cr>
EDIT
I read your question twice, it seems that you want to do it in a single command line.
Your script has problem, normal doesn't support |, that is, it must be the last command.
try this line, if it works for you:
exe 'norm gg'|/_tmm/-1|0,.d