pre-commit.com: Same version in .pre-commit-config.yaml and requirements.txt - pre-commit.com

I would like to use the exactly same version of flake8 in requirements.txt and in .pre-commit-config.yaml.
To avoid redundancy I would like to keep the version number of flake8 exactly once in my repo.
Can pre-commit.com read the version number of flake8 from requirements.txt?

it cannot
pre-commit intentionally does not read from the repository under test as this makes caching intractable
you can read more in this issue and the many duplicate issues linked there
for me, I no longer include flake8, etc. in my requirements files as pre-commit replaces the need to install linters / code formatters elsewhere
disclaimer: I created pre-commit

Related

differences between sdist and hackage build test

I have problems with packages uploaded to hackage to pass the build test. The pacakges compile fine with cabal (and with stack) and I can upload the sdist files to hackage.
I found some confusing error messages, likely related to "single character directories".
Once cleaned this, upload completes but the packages fail the check during the build. Most of the issues seem related to the inclusion of C sources and C headers (see cabal build error `gcc` failed: connot find `#include`).
What are the differences in the requirements for a package to build on hackage? Could sdist check these requirements - to avoid a plethora of futile package versions on hackage, which can never disappear. Or, is there a way to check a package whether it builds on hackage without publishing (i.e. make it permanent)?
Can somebody point me to a comprehensive explanations of the requirements of hackage uploads?

django: pip install 'app' installs to the venv?

I read that it is not proper to have the venv in the git repo, which should be taken care of by requirements.txt, but I have run into a problem...
If I am working in my venv and I do a pip install of some app, it installs it into the venv site-packages. It still works and everything if I add it to installed_apps, but what if I make some changes within that directory? Then git doesn't track them and I am out of luck when I try to push it.
What is the proper way to do this?
EDIT: I must be having a huge miscommunication here so let me explain with a concrete example...
I run...
pip install django-messages
This then install django messages into me venv, I know I can run...
local...pip freeze > requirements.txt
remote....pip install -r requirements.txt
My problem is that I want to make changes to django-messages/templates or django-messages/views thus deviating my django-messages from the one which can be installed from requirements.txt
I don't see how these are to remain in my venv without being completely uneditable/untrackable
This is exactly how it is supposed to work. You track what libraries you install via your requirements.txt, which is committed along with your code. You use that file to generate your venv, and the libraries are installed there. You don't include the venv itself in your repo.
Edit The reason you are finding this hard is that you are not supposed to do that. Don't change third-party projects; you should never need to. They will be configurable.
If you really really find something you need to fix, do as suggested in the comments and fork the app. But this is definitely not something you need to do all the time, which points to the likelihood that you have not understood how to configure the apps from within your own project.
For example, in the case of customising templates, you can simply define the templates inside your own templates dir, rather than editing the ones provided with the app; Django does the right thing and uses yours first.
From your edits it looks like what you want to do is fork the django-messages library. This means that installing it into site-packages is a bad idea in the first place, since site-packages is not supposed to be version controlled or edited, it is designated for 3rd party software. You have two options. You can just grab the source from GitHub and put it somewhere where your Django app can find it (maybe fiddle with your python path) and add this location to git. Maybe even make your own fork on github. The second option is to use pip install -e github.com/project to have pip install an "editable" version. The advantage of the first way is better control over your changes, the advantage of the second way is having pip manage source download and install.
That being said, you seem kinda new to python environment. Are you REALLY sure you want to make your own fork? Is there some functionality you are missing that you want to add to the messages library? You do know that you can override every single template without changing the actual library code?

How to prevent accidentally including old headers?

Build systems frequently have separate build and install steps. Sometimes, installed versions will have headers that are older installed on the operating system and those headers may be picked up instead of the headers in the source code. This can lead to very subtle and strange behavior in the source code that is difficult to diagnose because the code looks like it does one thing and the binary does another.
In particular, my group uses CMake and C++, but this question is also more broadly relevant.
Are there good techniques to prevent old headers from being picked up in a build?
1. Uninstall
Uninstall package from CMAKE_INSTALL_PREFIX while hacking develop version.
Pros: very effective
Cons: not flexible
2. Custom install location
Use custom location for installed target, don't add custom install prefix to build.
Pros: very flexible
Cons: if every package use this technique tons of -I option passed to
compiler and tons of <PACKAGE>_ROOT to cmake configure step.
3. Include priority
Use headers search priority. See include_directories command and
AFTER/BEFORE suboptions.
Pros: flexible enough
Cons: sometimes it's not a trivial task if you have a lot of find_package/add_subdirectory
commands, error-prone, errors not detected by autotesting.
BTW
Conflicts can occur not only between build/install directories, but also
in install directory itself. For example version 1.0 install: A.hpp and B.hpp,
version 2.0 install: A.hpp. If you sequentially install 1.0 and 2.0 targets
some #include<B.hpp> errors will not be detected locally. This kind of error can be easily
detected by autotesting (clean environment of CI server don't have old B.hpp file from 1.0 version). Uninstall command also can be helpfull.
Guys recently had fixed the exact same problem with shogun package. You basically need to have your source folders including your header files passed by -I to gcc before the system folders. You don't have to pass the system folders as -I to gcc anyway.
Have a look at the search path here. You might need to have a proper way of including your header files in your source code.
This is the pull request which fixed the problem I guess.

Are there any downsides to using virtualenv for scientific python and machine learning?

I have received several recommendations to use virtualenv to clean up my python modules. I am concerned because it seems too good to be true. Has anyone found downside related to performance or memory issues in working with multicore settings, starcluster, numpy, scikit-learn, pandas, or iPython notebook.
Virtualenv is the best and easiest way to keep some sort of order when it comes to dependencies. Python is really behind Ruby (bundler!) when it comes to dealing with installing and keeping track of modules. The best tool you have is virtualenv.
So I suggest you create a virtualenv directory for each of your applications, put together a file where you list all the 'pip install' commands you need to build the environment and ensure that you have a clean repeatable process for creating this environment.
I think that the nature of the application makes little difference. There should not be any performance issue since all that virtualenv does is to load libraries from a specific path rather than load them from the directory where they are saved by default.
In any case (this may be completely irrelevant), but if performance is an issue, then perhaps you ought to be looking at a compiled language. Most likely though, any performance bottlenecks could be improved with better coding.
There's no performance overhead to using virtualenv. All it's doing is using different locations in the filesystem.
The only "overhead" is the time it takes to set it up. You'd need to install each package in your virtualenv (numpy, pandas, etc.)
Virtualenvs do not deal with C dependencies which may be an issue depending on how how keen you are about reproducible builds and capturing all of the machine setup in one process. You might end up needing to install C libraries through another package manager such as brew apt or rpm, and these dependencies can be different between machine or change over time. To avoid this, you might end up using docker and friends - which then adds another layer of complexity.
conda goes tries to address the non-python dependencies. The issue is that it is bigger and slower.

Is there a C++ dependency index somewhere?

When trying new software and compiling with the classic ./configure, make, make install process, I frequently see something like:
error: ____.h: No such file or directory
Sometimes, I get really lucky and apt-get install ____ installs the missing piece and all is well. However, that doesn't always happen and I end up googling to find the package that contains what I need. And sometimes the package is the wrong version or flavor and is already used by another package that I downloaded.
How do people know which packages contain which .h files or whatever resource the compiler needs? Is there a dependency resolver website or something that people use to decode failed builds to missing packages? Is there a more modern method of automatically downloading and installing transitive dependencies for a build (somewhat like Java's Maven)?
You can also use "auto-apt ./configure" (on Ubuntu, and probably also on Debian?) and it will attempt to download dependencies automatically.
If it's a package in Debian, you can use apt-get build-dep to get all deps.
Otherwise, read the README that comes with the program -- hopefully, it lists all the deps for that program.
The required packages will hopefully be listed in the documentation for building the package. If it says you require foo, you'll probably want to look for foo and foo-devel, and perhaps libfoo-devel. If that doesn't help, in Fedora I'd do something like
yum install install /usr/include/_____.h
(yum will look for the package containing said file). If none of the above works, look for the file name in Google, that should tell you the package where it comes from. But then the going will get rough...