Can I use MeCab on IBM Data Science Experience - data-science-experience

I want to use Mecab on IBM Data Science Experience.
https://pypi.python.org/pypi/mecab-python3
Is it possible?

I'm afraid not, or at least not easily. That Python package requires native mecab libraries, which are not installed in the environment where DSX notebooks are running. Neither do users have permission to install them using a package manager (yum).
If you're willing to spend effort, you can try to put the libraries from a mecab rpm into the user file system. Then extend the environment variable LD_LIBRARY_PATH to include your user directory. Make sure to change the variable using Python's os.environ. This will have to be done every time the notebook executes.

Related

Should I move windows project folder into WSL?

I'm trying to set up a work environment on a new machine and I am a bit confused how best to procede.
I've set up a new windows machine and have WSL2 set-up; I plan on using that with VS Code for my development environment.
I have a previous django project that I want to continue working on stored in a folder in a thumb drive.
Do I move the [windows] project folder into the linux folder system and everything is magically ready to go?
Will my previous virtual environment in the existing folder still work or do I need to start a new one?
Is it better to just start a new folder via linux terminal and pull the project from github?
I haven't installed pip, python, or django on the windows OR linux side just yet either.
Any other things to look out for while setting this up would be really appreciated. I'm trying to avoid headaches later by getting it all set-up correctly now!
I would pull it from github, and make sure you have the correct settings for line endings, since they are different between windows and linux. Just let git manage these though:
https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings
Some other suggestions:
Use a version manager in linux to manage your python versions - something like pyenv or asdf. It will make life easier.
Make sure to always create a virtual environment for everything and don't pip install anything in your main python. (I use direnv for virtual env management)
The single exception to the previous suggestion is pipx, which I do install in the main python and then use to install things like cli tools, black, isort, pip-tools etc.
Configure VScode to use the pipx installed versions of black, flake8 etc. for linting purposes.
If you're using Docker, enable the WSL integration for your WSL flavour (probably Ubuntu). Note that docker desktop needs starting before your WSL session.

Installing Python 2.7.16 and packages offline. Concerns with dependencies

Problem
I am attempting to install Python 2.7.16, openpyxl, and pyinstaller onto a Windows 10 machine that is offline for security reasons. To clarify, I have a mapped network drive on there from which I can transfer the files I need to use.
Question
What is the best way to go about this? I currently have a .msi Python installation file directly from their website. The packages I need are packaged as .tar.gz files. I currently have those on my windows machine, but do not want to proceed until I know for sure what I need to do. Also, do I need to do anything for dependencies? If so, how do I find the dependencies for the packages I need?
Side Notes
The version of Python (2.7.16) comes with pip. Not sure if that makes a difference. Downloading and transferring things requires me to ask my admin, for him to download the files, and then transfer them to my drive so I can have them on my computer. If able, I would like to do this in as little attempts as possible.
Useful links
Python: https://www.python.org/downloads/release/python-2716/
openpyxl: https://pypi.org/project/openpyxl/#files
pyinstaller: https://pypi.org/project/PyInstaller/#files
My solution would be to seek out the offline versions of the python and pip installer and follow this guide
Also a great tip: try the complete procedure (the installing of the required software) on a seperate pc which you have disconnected and do the installation. Note everything you have to do to get it working and use those instruction on your originally intended machine. This will prevent you from having to go back and forth and scratch your head while installing on the target machine.
Please note that I have NO idea how python works and this is just a hunch from me as a programmer.
Installing Python and packages on an Offline Machine: A Comprehensive Guide
The Environment
Let us begin by defining the environment in which this guide may be of some great use. If your situation can be described by one or more of the following, you might have great results following this guide...
The machine you are developing on is offline. (No connection to the internet)
You need to develop and run Python on the machine that is completely offline.
If this sounds like you, read the following cases in which a few minor details may make a big difference in getting you started.
Case1:
You are not allowed to plug in any external media devices into the offline machine. This includes but is not limited to a USB, CD, floppy disk, or any other removable media that may be of some use in helping you transfer Python files to the offline machine.
You are allowed to map a network drive (somewhere else on the local network). This would fix the problem mentioned in number one with removable media.
Answer: In this case, just proceed with the guide, as this was my case and I will explain in detail how I solved my problem.
Case2:
There is no physical way to transfer files onto the development machine that is offline.
Answer: If this is your case, you need to get in touch with the admin team who handles the software on your development machine. Direct them to this guide to proceed.
Let's Get Started
Warning A:
The following must be performed on a computer with an internet connection. It is impossible to download things from any website without an internet connection.
Warning B:
There is a longer way, and there is a shorter way to do the following. To avoid the longer way, you must be able to install python on a different machine that is online. This can be the same machine that you are using to download the packages and python version, or it can even be a home machine. This can be any machine in the world that is on the internet. It's sole purpose will be to help you identify the dependencies of each package.
Installing Python
Visit the python website and identify the version you want. 2.7.9 and up is recommended for this guide. Download the file for your specific system.
Python 2.7.9 : https://www.python.org/downloads/release/python-279/
Python 3.7.3 : https://www.python.org/downloads/release/python-373/
The reason I provided Python 2.7.9 is because that is the earliest 2.7.x version that comes with pip (a package manager).
Visit the python package index to locate the packages you will be using in your python project. https://pypi.org/
Search the package you need, go to the downloads, and get the (.tar.gz) file. Not the .whl files unless you know what you are doing with those.
Tip: If you want to keep track of the packages you are installing, I suggest you put them all in one folder somewhere you can find, or just write them down on paper.
Unpack the .tar.gz package files. You can get rid of the .tar.gz once you unpack them as they will not needed any longer.
Install the version of python that you downloaded for your system in step 1 above.
(This may just be running the .msi file for windows or unpacking some files for linux) If you are not sure how, just look at this brilliant guide
https://realpython.com/installing-python/
Now you should be able to go to your terminal and type "python" and get the python interpreter to open up. If you get a "cannot find python command" you need to setup your path variable.
Windows guide: https://geek-university.com/python/add-python-to-the-windows-path/
Linux guide: https://www.tutorialspoint.com/python/python_environment.htm
Your python installation is done! And your packages should also be ready to install!
Installing Python Packages
What you need to know here is that MOST all python packages have dependencies, which are other packages which packages need installed before they can be installed. If you need more explanation on dependencies, read here: https://www.fullstackpython.com/application-dependencies.html
Before proceeding be sure to add the Python/Scripts folder to your path variable too or pip will not work. Follow this link for instructions. https://appuals.com/fix-pip-is-not-recognized-as-an-internal-or-external-command/
Install packages using pip install [package_name] for every package you need, on your machine that is on the internet, and then do a pip freeze to see all the packages installed.
Once you can see all the packages installed, which will include the dependencies for the ones you ran pip install on, you need to manually download these dependencies from the python package index https://pypi.org/ just like you did with the regular packages.
Moving Offline
Once you have identified all the packages you will need, and all of their dependencies, you will need to download them, unpack all of them, and move them into one folder, which I will call "OFFLINE_SETUP_FOLDER".
To be clear:
The packages we installed before was only to find out the dependencies we were going to need. You do not have to re-download the packages you have already downloaded before running pip install. You should only need to download the dependencies you have found during the pip freeze command.
Finally you need to copy into the "OFFLINE_SETUP_FOLDER" your python installation file, be it a .msi file for windows, or the .tar file for linux.
Your "OFFLINE_SETUP_FOLDER" should contain the following...
In the following, package can be the name of any package that you downloaded, and the a and b inpackage1a and package1b just represent dependencies for that package. These file names are just examples for packages
python.msi (installation file for python)
/package1 (normal package folder)
/package1a (package dependency folder)
/package1b (package dependency folder)
/package2 (normal package folder)
/package3 (normal package folder)
/package3a (package dependency folder)
Once this is complete, you need to move that folder onto the machine that is completely offline form the network.
Then run the installation for python as you did before and install it on the machine. Do no forget to setup the path variable. Refer back to the Installing Python section if needed.
Open your terminal or CMD and CD into the "OFFLINE_SETUP_FOLDER".
Now you need to CD into each individual package folder, and run this command: python setup.py install and let it run.
If the package install fails, it will be because one of the dependencies has not been installed. If this is the case, CD into the dependency that is says is missing, and run python setup.py install in there first.
Keep repeating these steps until all packages and dependencies have been installed.
This is the end of this python guide for installing python on an offline machine. I hope this helped :)

Can you add CLI tools to a Cloud Foundry app?

Terribly sorry if this is an obvious question. But I have a webapp that relies on a CLI tool to get it to work. I was wondering if there was a way I could specify this without using a custom buildpack. And how to go about doing this if possible
Any help on this would be great, thanks
Can you add CLI tools to a Cloud Foundry app?
It's not possible to directly install things with apt or apt-get. Your app runs as an unprivileged user and is unable to run those tools to install things.
This leaves you with a couple options:
Get the binary and bundle it with your app. Some people (no judgement from me though) would say that your app is responsible for bringing everything it needs to run anyway, so you should be doing this already.
Twelve-factor apps also do not rely on the implicit existence of any system tools. Examples include shelling out to ImageMagick or curl. [1]
This path works well for dependencies that are small or self contained, like statically built Go apps. If your app need shared libraries or other resources to function, you need to bundle those with your app too. It's also not great if the size of what you bundle is large. Everything you bundle will be pushed up with the app, so it can slow down your pushes. You are also responsible for tracking updates and making sure that you have the latest, bug free & security patched binaries & libraries.
The general steps for doing this are:
Create a folder like binaries/ under the root of your app, with subfolders of bin/ and lib/.
Place all your binaries under binaries/bin and any shared libraries they require under binaries/lib.
Add a .profile file at the root of your app. This will be sourced prior to your app starting so it will put any binaries you bundle on the path and add libraries to the search path.
In .profile put the following:
export PATH=$HOME/binaries/bin:$PATH
export LD_LIBRARY_PATH=$HOME/binaries/lib:$LD_LIBRARY_PATH
That should be it. Just push your app with all the new files.
Another easier option, is to use the Apt buildpack [2]. This buildpack can install required dependencies using apt. You just need to add an apt.yml file to the root of your app & run your app with multi-buildpacks (apt buildpack first, then your normal buildpack).
The main benefit of doing this is that you don't have to manage the dependencies. The apt buildpack will automatically install them from the repo you tell it to use, so it'll pick up new versions from there as well. This is good if what you need to install has a lot of dependencies, particularly sensitive dependencies (like openssl) or dependencies that get updated/patched often like other language runtimes (Python, Perl, Ruby, etc...).
Other benefits. It's easier because the buildpack takes care of adjusting PATH & LD_LIBRARY_PATH. It also makes the app size smaller so pushes are faster.
The downsides of this option are that the apt-buildpack is not an official buildpack (it's community maintained). It also works best when you have Internet access, so it can download binaries from the Internet, although you can work around this by using an internal repo.
There's a couple other options as well, but I wouldn't recommend them unless both options above are definitely not going to work for you.
Use Docker. You can set up your own Docker container with all the dependencies you need, plus your app code and cf push the Docker image to CF. The downside of this is that your lose the advantages of using buildpacks, so you're back to building and managing Docker images and all the required dependencies of your app all on your own.
You could create your own custom buildpack and supply the dependencies that way. I don't see any reason you'd want to do this though. It's a decent bit of work and in the end, you'd have something that's just more brittle and less flexible than Apt Buildpack.
It's technically possible to ship your own rootfs, but you really really shouldn't (I'm just including this to be thorough). This is the base file system that's used by all apps on CF. Doing this has a lot of drawbacks though, chiefly being that it's difficult. It also applies to all apps on the foundation, can bloat the size of the rootfs, and makes a larger attack surface for anything using the rootfs (i.e. all apps).
Hope that helps!
[1] https://12factor.net/dependencies
[2] https://github.com/cloudfoundry/apt-buildpack

Migrate a python software easily to other hosts (running on ubuntu)

I have built a software running on an ubuntu 14.04, it runs on a python environment with a lot of dependencies (databases, flask, etc...).
I am trying to figure out the best way to have an easily and portable environment for migration. For example, if I want to install the software on another computer or maybe AWS...
I checked out some solutions and for now, Vagrant, Puppet, Chef look interesting but I am a bit confused.
Furthermore, the software requires high performances and it would be way too slow to run it on a VM.
What are your thoughts on that ? Basically an easy way to install a software with all his dependencies and being able to migrate the environment on whatever host I would need.
Thanks a lot

Tomcat + Hudson and testing a Django Application

I'm using Hudson for the expected purpose of testing our Django application. In initial testing, I would deploy Hudson using the war method:
java -jar hudson.war
This worked great. However, we wanted to run the Hudson instance on Tomcat for stability and better flexibility for security.
However, now with Tomcat running Hudson does not seem to recognize previously-recognized Python libraries like Virtualenv. Here's an output from a test:
+ bash ./config/testsuite/hudson-build.sh
./config/testsuite/hudson-build.sh: line 5: virtualenv: command not found
./config/testsuite/hudson-build.sh: line 6: ./ve/bin/activate: No such file or directory
./config/testsuite/hudson-build.sh: line 7: pip: command not found
virtualenv and pip were both installed using sudo easy_install, where are they?
virtualenv: /usr/local/bin/virtualenv
pip: /usr/local/bin/pip
Hudson now runs under the tomcat6 user. If I su into the tomcat6 user and check for virtualenv, it recognizes it. Thus, I am at a loss as to why it doesn't recognize it there.
I tried removing the commands from a script and placing it line-by-line into the shell execute box in Hudson and still same issue.
Any ideas? Cheers.
You can configure your environment variables globally via Manage Hudson ->
Environment Variables or per machine via Machine -> Configure ->
Environment Variables (or per build with the Setenv plugin). It sounds like
you may need to set the PATH and PYTHONPATH appropriately; at least that's the
simple solution.
Edited to add: I feel as though the following is a bit of a rant, though not really directed at you or your situation. I think that you already have the right mindset here since you're using virtualenv and pip in the first place -- and it's not unreasonable for you to say, "we expect our build machines to have virtualenv and pip installed in /usr/local," and be done with it. Take the rest as you will...
While the PATH is a simple thing to set up, having different build
environments (or relying on a user's environment) is an integration "smell".
If you depend on a certain environment in your build, then you should either
verify the environment or explicitly set it up as part of the build. I put
environment setup in the build scripts rather than in Hudson.
Maybe your only assumption is that virtualenv and pip are in the PATH (because
those are good tools for managing other dependencies), but build
assumptions tend to grow and get forgotten (until you need to set up a new
machine or user). I find it useful to either have explicit checks, or refer to
explicit executable paths that are part of my defined build environment. It is
especially useful to have a explicitly defined environment when you have
legacy builds or if you depend on specific versions of your build tools.
As part of builds where I've had environment problems (especially on Windows
with cygwin), I print the environment as the first build step. (But I tend to
be a little paranoid proactive.)
I don't mean to sound so preachy, I'm just trying to share my perspective.
Just to add to Dave Bacher's comment:
If you set your path in .profile, it is most likely not executed when running tomcat. The .profile (or whatever the name is on your system) is only executed when you have a login shell. To set necessary environment variables, you have to use a different set of file. Sometimes they are called .env and they exist on global and user level. In my environment (AIX), the user level .env file can have a different name (name is set in the env variable either in global environment file (eg. /etc/environment) or by parameter, when starting the shell).
Disclaimer: This is for the IBM AIX ksh, but should be the same for ksh on other systems.
P.S. I just found a nice explanation for .profile and .env from the HP site. Notice that they speak of a login shell (!) when they speak about the execution of the .profile file.