Unable to use custom Mapreduce jar files in Cosmos - mapreduce

I created my own Mapreduce jar file and tested in the Cosmos' old Hadoop cluster successfully using the HDFS shell commands. The next step was to test the same jar in the new cluster, so I uploaded it to the
new cluster's HDFS, to my home folder (user/my.username).
When I try to start a Mapreduce job using the curl post below,
curl -X POST "http://computing.cosmos.lab.fiware.org:12000/tidoop/v1/user/my.username/jobs" -d '{"jar":"dt.jar","class_name":"DistanceTest","lib_jars":"dt.jar","input":"input","output":"output"}' -H "Content-Type: application/json" -H "X-Auth-Token: xxxxxxxxxxxxxxxxxxx"
I get:
{"success":"false","error":255}
I tried different path values for the jar and I get the same result. Do I have to upload my jar to somewhere else or am I missing some necessary steps?

There was a bug in the code, already fixed in the global instance of FIWARE Lab.
I've tested this:
$ curl -X POST "http://computing.cosmos.lab.fiware.org:12000/tidoop/v1/user/myuser/jobs" -d '{"jar":"mrjars/hadoop-mapreduce-examples.jar","class_name":"wordcount","lib_jars":"mrjars/hadoop-mapreduce-examples.jar","input":"testdir","output":"outputdir"}' -H "Content-Type: application/json" -H "X-Auth-Token: mytoken"
{"success":"true","job_id": "job_1460639183882_0016"}
Please observe in this case, mrjars/hadoop-mapreduce-examples.jar is a relative path under hdfs:///user/myuser. Always use relative paths with this opeartion.
$ curl -X GET "http://storage.cosmos.lab.fiware.org:14000/webhdfs/v1/user/myuser/mrjars?op=liststatus&user.name=myuser" -H "X-Auth-Token: mytoken"
{"FileStatuses":{"FileStatus":[{"pathSuffix":"hadoop-mapreduce-examples.jar","type":"FILE","length":270283,"owner":"myuser","group":"myuser","permission":"644","accessTime":1464702215233,"modificationTime":1464702215479,"blockSize":134217728,"replication":3}]}}
The result:
$ curl -X GET "http://storage.cosmos.lab.fiware.org:14000/webhdfs/v1/user/myuser/outputdir?op=liststatus&user.name=myuser" -H "X-Auth-Token: mytoken"
{"FileStatuses":{"FileStatus":[{"pathSuffix":"_SUCCESS","type":"FILE","length":0,"owner":"myuser","group":"myuser","permission":"644","accessTime":1464706333691,"modificationTime":1464706333706,"blockSize":134217728,"replication":3},{"pathSuffix":"part-r-00000","type":"FILE","length":128,"owner":"myuser","group":"myuser","permission":"644","accessTime":1464706333264,"modificationTime":1464706333460,"blockSize":134217728,"replication":3}]}}
I'll upload the fix to the Cosmos repo in GitHub ASAP.

Related

How to automate the bitbucket repository to databricks repos by using ci/cd pipeline

Can someone help on this how to automate the ci/cd pipeline to update and create new files in databricks repos from bitbucket repositories..
If you want to sync changes from BitBucket repository into Databricks Repos, then you have following possibilities:
Use databricks repos update command of the databricks-cli package, like this:
databricks repos update --path /Repos/user/repository --branch <branch_name>
Use Update command of Repos API, but it's too low-level because it doesn't work with paths, and you need to know Repository ID that could obtained via Workspace API. So it's several commands instead of single one:
curl -s -n -X GET -o /tmp/staging-repo-info.json "$DATABRICKS_HOST/api/2.0/workspace/get-status" -H "Authorization: Bearer $DATABRICKS_TOKEN" -d '{"path":"/Repos/Staging/databricks-nutter-projects-demo"}'
export STAGING_REPOS_ID=$(cat /tmp/staging-repo-info.json|grep '"object_type":"REPO"'|sed -e 's|^.*"object_id":\([0-9]*\).*$|\1|')
curl -s -n -X PATCH -o "/tmp/$(Build.SourceBranchName)-out.json" "$DATABRICKS_HOST/api/2.0/repos/$STAGING_REPOS_ID" \
-H "Authorization: Bearer $DATABRICKS_TOKEN" -d "{\"branch\": \"$(Build.SourceBranchName)\"}"
P.S. You can find end-to-end CI/CD demo with Repos and Azure DevOps in this repository. Although it's not a BitBucket, but the structure of the pipeline remains the same.

Post comment with custom CodeBuild build information to GitHub PR

A CodeBuild build gets triggered by a new commit in GitHub PR via webhook. The build uses a buildspec.yml file for steps it needs to run. Then it automatically posts a fail/success status back to the PR.
Is it possible to send a comment back to the PR after the build completes with some custom information, such as version, link to the version, link to the logs, etc?
I added the following to the shell script run from buildspec:
curl -s -H "Authorization: token ${TOKEN}" \
-X POST -d "{\"body\": \"Sample Comment" \
"https://api.github.com/repos/${OWNER}/${REPO}/issues/${PR_NUMBER}/comments"

Uploading only html files using gsutil?

This seems like a really basic question but I can't seem to make it work.
I am using a static site generator for a website. I want to set all my html files to never be cached and all the rest to be cached. To do this, I'd like to upload all non-html files and set the cache headers. This is straight forward using:
gsutil -m -h "Cache-Control:public, max-age=31536000" rsync -x ".*\.html$" -r dist/ gs://bucket/
But how do I then upload only my html files? I've tried cp and rsync with wildcards, but I try something like:
gsutil -h "Content-Type:text/html" -h "Cache-Control:private, max-age=0, no-transform" rsync -r 'dist/**.html' gs://bucket/
I get: CommandException: Destination ('dist/**.html') must match exactly 1 URL
You want to copy the files into the bucket so have to use "cp" command.
Try the following code:
gsutil -h "Content-Type:text/html" -h "Cache-Control:private, max-age=0, no-transform" cp dist/**.html gs://YOUR_BUCKET

Chrome Postman - how to avoid unnecessary headers

I am trying to build a curl request replica using POSTMAN extension in Chrome
For even a simple POST -d, it adds a POSTMAN token. How can I avoid this? Thanks
curl -X POST -H Cache-Control:no-cache -H Postman-Token:494ce988-48f7-67b4-4b8c-90f63c4668f1 -d 'code=newcode' http://127.0.0.1:8000/snippets/6/
In the settings dialog, there is a 'Send postman-token header' option.
This is only available in the packaged app (https://chrome.google.com/webstore/detail/postman-rest-client-packa/fhbjgbiflinjbdggehcddcbncdddomop?hl=en), though, not in the in-chrome version.

upload image to imagefield with djangorestframework using json and test this with CURL

I have made several apis in djangorestframework.
This I could test both with the html form of the api as with curl in commandline.
Now I have an api to a Model with one off the fields an ImageField.
I can't figure out which curl command to use.
Using the syntax I used before fot post actions in json format, it would be:
curl -X POST -S -H 'Content-Type: application/json' -u "username:password" --data-binary '{"otherfields":"something", "photo":"/home/michel/test.jpg"}' 127.0.0.1:8000/api/v1/
but in this case the photo will not be saved and left empty (the photo is an optional field)
adding -T /home/michel/test.jpg
I get an error message saying 127.0.0.1:800/api/v1/test.jpg does not exist as an url.
In the test html form of djangorestframework, all works fine.
Using the -F option, it says I can only do 1 request at a time...
I also removed the datatype from data-binary
Can anybody help me how to make this curl post with both the image and the other data in json in 1 command.
After a long puzzle, this seems to do the trick:
put all json arguments in separate -F arguments
only use the header Accept (not Content-Type)
And specify the image type
Use # to indicate the local file to upload
curl -X POST -S \
-H 'Accept: application/json' \
-u "username:password" \
-F "otherfields=something" \
-F "photo=#/home/michel/test.jpg;type=image/jpg" \
http://127.0.0.1:8000/api/v1/
By the way, I know all of this is on the documentation site of curl, but just missed an example of all those things together since there are a lot of options to try out.