Using mssql-tools bcp from HDFS NFS mount

Using mssql-tools bcp from HDFS NFS mount - hdfs

Trying to run bcp from the mssql-tools package (using centos7) to export tsv file data from an HDFS location mounted to local FS via NFS Gateway, but running into errors like...
SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Unable to open BCP error-file
or
SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Unable to open BCP host data-file
The bcp command being run looks like...
/opt/mssql-tools/bin/bcp "$TABLE" in \
"$filename" \
$TO_SERVER_ODBCDSN \
-U $USER -P $PASSWORD \
-d $DB \
$RECOMMEDED_IMPORT_MODE \
-t "\t" \
-e ${filename}.bcperror.log
# with the actual commmand w/ variables resolved looks like...
/opt/mssql-tools/bin/bcp "ACCOUNT" in \
"/HDFS_NFS/path/to/tsv/1_0_0.tsv" \
-D -S MyMSSQLServer \
-U myuser -P mypassword \
-d SOME_MSSQL_DB \
-c \
-t \t \
-e /HDFS_NFS/path/to/store/errlogs/1_0_0.tsv.bcperror.log
all of this seems fine to me, yet also sometimes getting errors like...
/opt/mssql-tools/bin/bcp: unknown option
usage: /opt/mssql-tools/bin/bcp {dbtable | query} {in | out | queryout | format} datafile ...
so not sure what that's about either. My /etc/odbc.ini file looks like...
[MyMSSQLServer]
Driver=ODBC Driver 17 for SQL Server
Description=My MS SQL Server
Trace=No
Server=<the server's IP>
Anyone know any further debugging tips or fixes for this?

The problem appears to be that the error logging file specified by the -e option was already existing in the location specified and that HDFS (mounted via NFS or not) did not like the bcp command trying to overwrite it. You would normally do something like
hadoop fs -put -f /some/local/file /hdfs/location/for/file
and I assume that bcp was attempting something else via the NFS gateway that was not this. I suppose there also could have been latency problems with bcp accessing the HDFS NFS location. Running the bcp command without the -e option worked in the example originally posted.
** As a workaround, based on another SO post, I bring the files down (hadoop fs -get ...) to a local temp dir /home/user/tmp/<some uuid>/ and do what needs to be done there, then hadoop fs -put ....

Related

pg_restore: error: input file is too short (read 0, expected 5)

I have a backup of PostgreSQL database (taken from Heroku) now I want to restore this backup to my local database but it's not working. I tried from CMD and pgadmin, but no luck!
What wrong I am doing?
pg_restore --verbose --clean --no-acl --no-owner -h localhost -U nafuser -d naf latest.dump
pgadmin issue:

Add files to S3 Bucket using Shell Script

Goal: to push files in gri/ to S3 bucket using SendToS3.sh shell script.
I am following this Tutorial.
SendToS3.sh is in cwd. It needs to fetch all files, that are not in sub-folders, in cwd's gri/.
Terminal:
me#PF2DCSXD:/mnt/c/Users/me/Documents/GitHub/workers-python/workers/data_simulator/data$ ./SendToS3.sh
./SendToS3.sh: line 17: logInfo: command not found
curl: Can't open '/gri/*'!
curl: try 'curl --help' or 'curl --manual' for more information
curl: (26) Failed to open/read local data from file/application
./SendToS3.sh: line 27: logInfo: command not found
SendToS3.sh:
bucket=simulation
files_location=/gri/ # !
now_time=$(date +"%H%M%S")
contentType="application/x-compressed-tar"
dateValue=`date -R`
# your key goes here..
s3Key= # CENSORED
# your secrets goes here..
s3Secret= # CENSORED
function pushToS3()
{
files_path=$1
for file in $files_path*
do
fname=$(basename $file)
logInfo "Start sending $fname to S3"
resource="/${bucket}/${now_date}/${fname}_${now_time}"
stringToSign="PUT\n\n${contentType}\n${dateValue}\n${resource}"
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${s3Secret} -binary | base64`
curl -X PUT -T "${file}" \
-H "Host: ${bucket}.s3.amazonaws.com" \
-H "Date: ${dateValue}" \
-H "Content-Type: ${contentType}" \
-H "Authorization: AWS ${s3Key}:${signature}" \
https://${bucket}.s3.amazonaws.com/${now_date}/${fname}_${now_time}
logInfo "$fname has been sent to S3 successfully."
done
}
pushToS3 $files_location
Please let me know if there is anything else I can add to post.

Your system doesn't have loginfo, so maybe switch that command to echo. For your curl error it could be a file permission errors, try running:
chmod -rwx gri.
Alternatively, you could use the aws cli instead, which is much easier to use imo.

The error is at this following line. The folder /gri/ is empty or the user launching the script cannot have access to it.
curl: Can't open '/gri/*'!
Moreover, it seems that your server does not have the executable LogInfo installed, or it is not accessible from your script SendToS3.sh.
Verify the installation and add the binary to the PATH env variable.
./SendToS3.sh: line 17: logInfo: command not found
Bonus: instead of using curl, you can use aws-cli which is optimized to interract with aws components. Please find the documentation for s3 here: https://docs.aws.amazon.com/cli/latest/reference/s3/
For example, you can copy a file to w bucket with this command:
aws s3 cp <path_to_file> s3://<bucket_name>/<path>/

How to check for Jupyter active notebooks through command line

I have an AWS EMR running Jupyterhub version 0.8.1+ that I want to check if there are any active notebooks that are running any code.
I've tried the below commands but they don't seem to output what I'm looking for here since the users server is always running and notebooks can be running without any code being executed.
# only lists running servers and jovyan is always running.
sudo docker exec jupyterhub jupyter notebook list
# No useful information outputted
curl -k -i -H "Accept: application/json" "https://localhost:9443/api/sessions"
# always lists processes regardless of running notebooks
ps aux | grep ipykernel
# The last_activity only updates when a user creates a new file or folder in the ui.
curl -k https://localhost:9443/hub/api/users/$user -H "Authorization: token $admin_token" | jq -r .last_activity
curl -k https://localhost:9443/hub/api/users -H "Authorization: token $admin_token" | jq -r .last_activity
Im following this AWS blog to check if the entire EMR is idle before terminating the cluster but they never seemed to have fully implemented the jupyter checks.
https://aws.amazon.com/blogs/big-data/optimize-amazon-emr-costs-with-idle-checks-and-automatic-resource-termination-using-advanced-amazon-cloudwatch-metrics-and-aws-lambda/
Most of the files referenced can be found in Github https://github.com/septian-putra/emr-monitoring

To see if a notebook is "idle" for "busy" you can run curl -ks https://localhost:9443/user/jovyan/api/kernels -H "Authorization: token ${admin_token}"
With this command all you need to do is put it in a simple if statement with a grep -q in order to get a true false idle value.
if [ $(curl -ks https://localhost:9443/user/jovyan/api/kernels -H "Authorization: token ${admin_token}" | grep -q "busy") ]; then
JUPYTER_BUSY_NOTEBOOKS=1
else
JUPYTER_BUSY_NOTEBOOKS=0
fi
(curl -ks For a silent output and to ignore ssl. jovyan being my admin user)
Documentation
https://jupyter-kernel-gateway.readthedocs.io/en/latest/websocket-mode.html#http-resources
/api/sessions might also be useful to look at.

Docker env variables not set while log via shell

In my settings file i am getting env variables like this
'NAME': os.environ['PG_DBNAME'], # Database
I am setting in docker file like this
-e PG_DBNAME= "mapp"
Now
The web app work fine
If i log into shell via docker exec ... bash then env variables are also set
But if i log in via ipaddress and port number from ssh client then i am able to login but env variables are not set

As commented in issue 2569:
This is expected. SSH wipes out the environment as part of the login process.
One way to work around it is to dump the environment variables in /etc/environment (e.g. env | grep _ >> /etc/environment) before starting Supervisor.
Further "login processes" should source this file, and tada! There is your environment.
That env | grep _ >> /etc/environment could be part of a default run script associated (through ENTRYPOINT or CMD) to your image.
Daniel A.A. Pelsmaeker suggests jenkinsci/docker-ssh-agent issue 33 for an approach that selects and sets all environment variables excluding a specific denylist:
For my own uses I changed that line to the following:
env | egrep -v "^(HOME=|USER=|MAIL=|LC_ALL=|LS_COLORS=|LANG=|HOSTNAME=|PWD=|TERM=|SHLVL=|LANGUAGE=|_=)" >> /etc/environment
This takes all environment variables, except those listed, and appends then to /etc/environment, overriding any previously defined there.

I also had the exact same problem. I found the example on docs.docker.com appending variables by echo'ing to /etc/profile not the nicest way to do that. So here is my solution:
Dockerbuild:
I execute the docker build by the following command which also fetches the http_proxy, https_proxy and no_proxy variables from the
current shell session. The variables are passed as agruments with the --build-arg option.
[root#localhost dock-centOS]# docker build
--build-arg http_proxy="{{ lookup('env', 'http_proxy')}}"
--build-arg https_proxy="{{ lookup('env', 'https_proxy')}}"
--build-arg no_proxy="{{ lookup('env', 'no_proxy')}}"
-t my_pv_repo:centOS-with-sshd .
Dockerfile:
I use the following dockerfile snippet for setting the enviroment variables for all users. The ARG command is used instead of
ENV because i don't want docker to persist my variables in the image. The ARG variable is only available during the docker build.
The RUN command creates a bash script which is placed in the /etc/profile.d directory. During start-up of the container
/etc/profile script is run and sources all readable files in the /etc/profile.d directory.
FROM centos:7.3.1611
ARG http_proxy=$http_proxy
ARG https_proxy=$https_proxy
ARG no_proxy=$no_proxy
ARG JAVA_HOME=/usr/lib/jvm/jdk1.6.0_45
ARG DOMAIN_HOME=/home/oracle/w001/D1/app/user_projects/domains/fancy_app_domain
ARG PATH=$PATH:/usr/lib/jvm/jdk1.6.0_45/bin
ARG XAUTHORITY=~/.Xauthority
RUN shebang='#!/usr/bin/env bash'; \
env_vars="export http_proxy=${http_proxy} https_proxy=${https_proxy} no_proxy=${no_proxy}"; \
env_vars+=' JAVA_HOME=/usr/lib/jvm/jdk1.6.0_45 DOMAIN_HOME=/home/oracle/w001/D1/app/user_projects/domains/fancy_app_domain'; \
env_vars+=" PATH=${PATH}:/usr/lib/jvm/jdk1.6.0_45/bin XAUTHORITY=${XAUTHORITY}"; \
echo $shebang$'\n'$env_vars > /etc/profile.d/env_vars.sh
Test result: Well lets hit the cli to check if our environment variables are available during a ssh session.
[root#localhost vagrant]# docker exec -u root -it centOS-container bash
[root#33e7efab489c /]#
[root#33e7efab489c /]#
[root#33e7efab489c /]# cat /etc/profile.d/env_vars.sh
#!/usr/bin/env bash
export http_proxy=http://10.0.2.2:3128 https_proxy=http://10.0.2.2:3128 no_proxy=localhost,127.0.0.1 JAVA_HOME=/usr/lib/jvm/jdk1.6.0_45 DOMAIN_HOME=/home/oracle/w001/D1/app/user_projects/domains/fancy_app_domain PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/jvm/jdk1.6.0_45/bin XAUTHORITY=~/.Xauthority
[root#33e7efab489c /]#
[root#33e7efab489c /]#
[root#33e7efab489c /]# printenv
HOSTNAME=33e7efab489c
TERM=xterm
http_proxy=http://10.0.2.2:3128
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/jvm/jdk1.6.0_45/bin
DOMAIN_HOME=/home/oracle/w001/D1/app/user_projects/domains/fancy_app_domain
PWD=/
JAVA_HOME=/usr/lib/jvm/jdk1.6.0_45
LANG=en_US.UTF-8
https_proxy=http://10.0.2.2:3128
SHLVL=1
HOME=/root
no_proxy=localhost,127.0.0.1
XAUTHORITY=/root/.Xauthority
_=/usr/bin/printenv
[root#33e7efab489c /]#
[root#33e7efab489c /]#
[root#33e7efab489c /]# exit
[root#localhost vagrant]# exit
[vagrant#localhost ~]$ logout
Connection to 127.0.0.1 closed.
me#my-mac$ ssh -X root#localhost -p 7022 -o UserKnownHostsFile=/dev/null -o IdentityFile=/development/workspace/supercalifragilisticexpialidocious-app/.vagrant/machines/default/virtualbox/private_key
The authenticity of host '[localhost]:7022 ([127.0.0.1]:7022)' can't be established.
ECDSA key fingerprint is SHA256:dTd/vsmPTbrA3kPeIfArZMFEgfdlgjGHwMgE3Z5BgBc.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[localhost]:7022' (ECDSA) to the list of known hosts.
/usr/bin/xauth: file /root/.Xauthority does not exist
[root#33e7efab489c ~]# su - oracle
bash-4.2$
bash-4.2$
bash-4.2$ printenv
HOSTNAME=33e7efab489c
SHELL=/bin/bash
TERM=xterm-256color
HISTSIZE=1000
http_proxy=http://10.0.2.2:3128
USER=oracle
LS_COLORS=rs=0:di=38;5;27:ln=38;5;51:mh=44;38;5;15:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=05;48;5;232;38;5;15:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;34:*.tar=38;5;9:*.tgz=38;5;9:*.arc=38;5;9:*.arj=38;5;9:*.taz=38;5;9:*.lha=38;5;9:*.lz4=38;5;9:*.lzh=38;5;9:*.lzma=38;5;9:*.tlz=38;5;9:*.txz=38;5;9:*.tzo=38;5;9:*.t7z=38;5;9:*.zip=38;5;9:*.z=38;5;9:*.Z=38;5;9:*.dz=38;5;9:*.gz=38;5;9:*.lrz=38;5;9:*.lz=38;5;9:*.lzo=38;5;9:*.xz=38;5;9:*.bz2=38;5;9:*.bz=38;5;9:*.tbz=38;5;9:*.tbz2=38;5;9:*.tz=38;5;9:*.deb=38;5;9:*.rpm=38;5;9:*.jar=38;5;9:*.war=38;5;9:*.ear=38;5;9:*.sar=38;5;9:*.rar=38;5;9:*.alz=38;5;9:*.ace=38;5;9:*.zoo=38;5;9:*.cpio=38;5;9:*.7z=38;5;9:*.rz=38;5;9:*.cab=38;5;9:*.jpg=38;5;13:*.jpeg=38;5;13:*.gif=38;5;13:*.bmp=38;5;13:*.pbm=38;5;13:*.pgm=38;5;13:*.ppm=38;5;13:*.tga=38;5;13:*.xbm=38;5;13:*.xpm=38;5;13:*.tif=38;5;13:*.tiff=38;5;13:*.png=38;5;13:*.svg=38;5;13:*.svgz=38;5;13:*.mng=38;5;13:*.pcx=38;5;13:*.mov=38;5;13:*.mpg=38;5;13:*.mpeg=38;5;13:*.m2v=38;5;13:*.mkv=38;5;13:*.webm=38;5;13:*.ogm=38;5;13:*.mp4=38;5;13:*.m4v=38;5;13:*.mp4v=38;5;13:*.vob=38;5;13:*.qt=38;5;13:*.nuv=38;5;13:*.wmv=38;5;13:*.asf=38;5;13:*.rm=38;5;13:*.rmvb=38;5;13:*.flc=38;5;13:*.avi=38;5;13:*.fli=38;5;13:*.flv=38;5;13:*.gl=38;5;13:*.dl=38;5;13:*.xcf=38;5;13:*.xwd=38;5;13:*.yuv=38;5;13:*.cgm=38;5;13:*.emf=38;5;13:*.axv=38;5;13:*.anx=38;5;13:*.ogv=38;5;13:*.ogx=38;5;13:*.aac=38;5;45:*.au=38;5;45:*.flac=38;5;45:*.mid=38;5;45:*.midi=38;5;45:*.mka=38;5;45:*.mp3=38;5;45:*.mpc=38;5;45:*.ogg=38;5;45:*.ra=38;5;45:*.wav=38;5;45:*.axa=38;5;45:*.oga=38;5;45:*.spx=38;5;45:*.xspf=38;5;45:
MAIL=/var/spool/mail/oracle
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/jvm/jdk1.6.0_45/bin
DOMAIN_HOME=/home/oracle/w001/D1/app/user_projects/domains/fancy_app_domain
PWD=/home/oracle
JAVA_HOME=/usr/lib/jvm/jdk1.6.0_45
LANG=en_US.UTF-8
https_proxy=http://10.0.2.2:3128
HISTCONTROL=ignoredups
SHLVL=1
HOME=/home/oracle
no_proxy=localhost,127.0.0.1
LOGNAME=oracle
XAUTHORITY=/home/oracle/.Xauthority
_=/usr/bin/printenv

sudo apache2 stop doesn't work

not sure why but when I run:
ubuntu#ip-10-46-206-16:/etc/init.d$ sudo apache2 stop
Usage: apache2 [-D name] [-d directory] [-f file]
[-C "directive"] [-c "directive"]
[-k start|restart|graceful|graceful-stop|stop]
[-v] [-V] [-h] [-l] [-L] [-t] [-S] [-X]
Options:
-D name : define a name for use in <IfDefine name> directives
-d directory : specify an alternate initial ServerRoot
-f file : specify an alternate ServerConfigFile
-C "directive" : process directive before reading config files
-c "directive" : process directive after reading config files
-e level : show startup errors of level (see LogLevel)
-E file : log startup errors to file
-v : show version number
-V : show compile settings
-h : list available command line options (this page)
-l : list compiled in modules
-L : list available configuration directives
-t -D DUMP_VHOSTS : show parsed settings (currently only vhost settings)
-S : a synonym for -t -D DUMP_VHOSTS
-t -D DUMP_MODULES : show all loaded modules
-M : a synonym for -t -D DUMP_MODULES
-t : run syntax check for config files
-X : debug mode (only one worker, do not detach)
it doesn't seem to stop the server. I still trying to ping the ip and it's returning the default page. Is there a reason why?

Try:
sudo service apache2 stop
See here:
https://help.ubuntu.com/community/ApacheMySQLPHP
The older way would be:
sudo /etc/init.d/apache2 stop
Note that when you do sudo apache2 stop, you are running apache2 from you PATH, not from the current folder (usually, . is not in the PATH). Try sudo ./apache2 stop for that.
See here:
http://www.cyberciti.biz/faq/ubuntu-linux-start-restart-stop-apache-web-server/

This is not a programming question. I guess you have to look for answers here http://serverfault.com

You have to type full path to apache2 init script. Example:
sudo /etc/init.d/apache2 stop

check ubuntu#ip-10-46-206-16:/etc/init.d$ ./apache2 stop
this will slove your problem

One thing is missing in the command, I encountered the same issue:
sudo apache2 stop
Should be
sudo service apache2 stop
Hope no one search too far for missing service part of the command :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using mssql-tools bcp from HDFS NFS mount - hdfs

Related

pg_restore: error: input file is too short (read 0, expected 5)

Add files to S3 Bucket using Shell Script

How to check for Jupyter active notebooks through command line

Docker env variables not set while log via shell

sudo apache2 stop doesn't work

Categories

Resources