In the moment, I have two github repositories, i.e. repo1 and repo2. Both are two django projects created by our team. In requirements.pipin ~/work_projects/repo1, I have the line
-e git+ssh://git#gitlab.com/repo2.git#de5622dcf0b9a084f9b0a34cdd1d932026904370#egg=repo2
Hence, repo2 becomes a library used by repo1 in ~/.virtualenvs/venv/src (repo1's virtual environment). In the moment, I need to modify both repositories at the same time. My main focus in the moment is that each time I modify repo2, I need to test out the results on repo1. I want to look at the impact of repo2 on repo1 once modified.
I don't want to push my changes on github and reinstall repo2 on repo1 each time I want to see those changes. How could I make it works easily, workaround?
I have a similar setup and I usually install from file (run from console):
pip install -e ../repo2
And since you said git, and you also said you don't want to push, but nothing about committing, here's a version that can install your library from a tag, branch or commit in a local git repo:
pip install -e git+file:~/work_projects/repo2#develop#egg=repo2
Related
I would like to run mypy static type checks with pre-commit and thus have the below config in .pre-commit-config.yaml
repos:
- repo: https://github.com/pre-commit/mirrors-mypy
rev: 'v1.0.0'
hooks:
- id: mypy
args: ['--ignore-missing-imports', '--cache-dir', '/dev/null', '--show-error-codes']
However, I get errors like No module named 'mypy_django_plugin', and basically the same for all project dependencies. I am aware additional_dependencies should be used to list dependencies. The thing is, the project has 30+ dependencies and manually listing them here is a bit impractical (same goes for keeping them in sync with Pipfile).
I have also read elsewhere (SO, pre-commit GitHub) that installing dependencies dynamically from a requirements file is not supported.
So is it somehow possible to populate additional_dependencies from a pipfile.lock? Is there a solution for Pipenv similar to the one existing for Poetry (see link below.). Or is there another workaround to make mypy work?
PS:
How to have a single source of truth for poetry and pre-commit package version? deals with the same problem, but the post and solution are Poetry specific.
What is the idiomatic way to write a docker file for building against many different versions of the same compiler?
I have a project which tests against a wide-range of versions of different compilers like gcc and clang as part of a CI job. At some point, the agents for the CI tasks were updated/changed, resulting in newer jobs failing -- and so I've started looking into dockerizing these builds to try to guarantee better reliability and stability.
However, I'm having some difficulty understanding what a proper and idiomatic approach is to producing build images like this without causing a large amount of duplication caused by layers.
For example, let's say I want to build using the following toolset:
gcc 4.8, 4.9, 5.1, ... (various versions)
cmake (latest)
ninja-build
I could write something like:
# syntax=docker/dockerfile:1.3-labs
# Parameterizing here possible, but would cause bloat from duplicated
# layers defined after this
FROM gcc:4.8
ENV DEBIAN_FRONTEND noninteractive
# Set the work directory
WORKDIR /home/dev
COPY . /home/dev/
# Install tools (cmake, ninja, etc)
# this will cause bloat if the FROM layer changes
RUN <<EOF
apt update
apt install -y cmake ninja-build
rm -rf /var/lib/apt/lists/*
EOF
# Default command is to use CMak
CMD ["cmake"]
However, the installation of tools like ninja-build and cmake occur after the base image, which changes per compiler version. Since these layers are built off of a different parent layer, this would (as far as I'm aware) result in layer duplication for each different compiler version that is used.
One alternative to avoid this duplication could hypothetically be using a smaller base image like alpine with separate installations of the compiler instead. The tools could be installed first so the layers remain shared, and only the compiler changes as the last layer -- however this presents its own difficulties, since it's often the case that certain compiler versions may require custom steps, such as installing certain keyrings.
What is the idiomatic way of accomplishing this? Would this typically be done through multiple docker files, or a single docker file with parameters? Any examples would be greatly appreciated.
I would separate the parts of preparing the compiler and doing the calculation, so the source doesn't become part of the docker container.
Prepare Compiler
For preparing the compiler I would take the ARG approach but without copying the data into the container. In case you wanna fast retry while having enough resources you could spin up multiple instances the same time.
ARG COMPILER=gcc:4.8
FROM ${COMPILER}
ENV DEBIAN_FRONTEND noninteractive
# Install tools (cmake, ninja, etc)
# this will cause bloat if the FROM layer changes
RUN <<EOF
apt update
apt install -y cmake ninja-build
rm -rf /var/lib/apt/lists/*
EOF
# Set the work directory
VOLUME /src
WORKDIR /src
CMD ["cmake"]
Build it
Here you have few options. You could either prepare a volume with the sources or use bind mounts together with docker exec like this:
#bash style
for compiler in gcc:4.9 gcc:4.8 gcc:5.1
do
docker build -t mytag-${compiler} --build-arg COMPILER=${compiler} .
# place to clean the target folder
docker run -v $(pwd)/src:/src mytag-${compiler}
done
And because the source is not part of the docker image you don't have bloat. You can also have two mounts, one for a readonly source tree and one for the output files.
Note: If you remove the CMake command you could also spin up the docker containers in parallel and use docker exec to start the build. The downside of this is that you have to take care of out of source builds to avoid clashes on the output folder.
put an ARG before the FROM and then invoke the ARG as the FROM
so:
ARG COMPILER=gcc:4.8
FROM ${COMPILER}
# rest goes here
then you
docker build . -t test/clang-8 --build-args COMPILER=clang-8
or similar.
If you want to automate just make a list of compilers and a bash script looping over the lines in your file, and paste the lines as inputs to the tag and COMPILER build args.
As for Cmake, I'd just do:
RUN wget -qO- "https://cmake.org/files/v3.23/cmake-3.23.1-linux-"$(uname -m)".tar.gz" | tar --strip-components=1 -xz -C /usr/local
When copying, I find it cleaner to do
WORKDIR /app/build
COPY . .
edit: formatting
As far as I know, there is no way to do that easily and safely. You could use a RUN --mount=type=cache, but the documentation clearly says that:
Contents of the cache directories persist between builder invocations without invalidating the instruction cache. Cache mounts should only be used for better performance. Your build should work with any contents of the cache directory as another build may overwrite the files or GC may clean it if more storage space is needed.
I have not tried it but I guess the layers are duplicated anyway, you just save time, assuming the cache is not emptied.
The other possible solution you have is similar to the one you mention in the question: starting with the tools installation and then customizing it with the gcc image. Instead of starting with an alpine image, you could start FROM scratch. scratch is basically the empty image, you could COPY the files generated by
RUN <<EOF
apt update
apt install -y cmake ninja-build
rm -rf /var/lib/apt/lists/*
EOF
Then you COPY the entire gcc filesystem. However, I am not sure it will work because the order of the initial layers is now reversed. This means that some files that were in the upper layer (coming from tools) now are in the lower layer and could be overwritten. In the comments, I asked you for a working Dockerfile because I wanted to try this out before answering. If you want, you can try this method and let us know. Anyway, the first step is extracting the files created from the tools layer.
How to extract changes from a layer?
Let's consider this Dockerfile and build it with docker build -t test .:
FROM debian:10
RUN apt update && apt install -y cmake && ( echo "test" > test.txt )
RUN echo "new test" > test.txt
Now that we have built the test image, we should find 3 new layers. You mainly have 2 ways to extract the changes from each layer:
the first is docker inspecting the image and then find the ids of the layers in the /var/lib/docker folder, assuming you are on Linux. Each layer has a diff subfolder containing the changes. Actually, I think it is more complex than this, that is why I would opt for...
skopeo: you can install it with apt install skopeo and it is a very useful tool to operate on docker images. The command you are interested in is copy, that extracts the layers of an image and export them as .tar:
skopeo copy docker-daemon:{image_name}:latest "dir:/home/test_img"
where image_name is test in this case.
Extracting layer content with Skopeo
In the specified folder, you should find some tar files and a configuration file (look at the skopeo copy command output and you will know which one is that). Then extract each {layer}.tar in a different folder and you are done.
Note: to find the layer containing your tools just open the configuration file (maybe using jq because it is json) and take the diff_id that corresponds to the RUN instruction you find in the history property. You should understand it once you open the JSON configuration. This is unnecessary if you have a small image that has, for example, debian as parent image and a single RUN instruction containing the tools you want to install.
Get GCC image content
Now that we have the tool layer content, we need to extract the gcc filesystem. we don't need skopeo for this one, but docker export is enough:
create a container from gcc (with the tag you need):
docker create --name gcc4.8 gcc:4.8
export it as tar:
docker export -o gcc4.8.tar gcc4.8
finally extract the tar file.
Putting all together
The final Dockerfile could be something like:
FROM scratch
COPY ./tools_layer/ /
COPY ./gcc_4.x/ /
In this way, the tools layer is always reused (unless you change the content of that folder, of course), but you can parameterize the gcc_4.x with the ARG instruction for example.
Read carefully: all of this is not tested but you might encounter 2 issues:
the gcc image overwrites some files you have changed in the tools layer. You could check if this happens by computing the diff between the gcc layer folder and the tools layer folder. If it happens, you can only keep track of that file/s and add it/them in the dockerfile after the COPY ./gcc ... with another COPY.
When in the upper layer a file is removed, docker marks that file with a .wh extension (not sure if it is different with skopeo). If in the tools layer you delete a file that exists in the gcc layer, then that file will not be deleted using the above Dockerfile (the COPY ./gcc ... instruction would overwrite the .wh). In this case too, you would need to add an additional RUN rm ... instruction.
This is probably not the correct approach if you have a more complex image that the one you are showing us. In my opinion, you could give this a try and just see if this works out with a single Dockerfile. Obviously, if you have many compilers, each one having its own tools set, the maintainability of this approach could be a real burden. Instead, if the Dockerfile is more or less linear for all the compilers, this might be good (after all, you do not do this every day).
Now the question is: is avoiding layer replication so important that you are willing to complicate the image-building process this much?
Context:
MainProject depends on a header-only dependency Module.
Both MainProject and Module are:
still under development and subject to modifications
modern CMake projects
independent repositories on Github
controlled by me
Problem:
Few months before, I tried without success to manage this dependency using CMake and versioning. Pressed by deadlines, I ended up opting for the "simplest solution" to copy-paste project Module headers in MainProject. Developing MainProject led to add features and modify interfaces in the Module local copy. Now there are two diverging Module.
How it could have worked
It could have worked if Module was very stable (copy/pasting headers is actually the solution I opted for dependencies that are stable and for which I don't have ownership).
I could have modified/commited/pushed/recopy/repasted the Module repository for every modification I wanted to bring. But of course I did not because ... time and deadlines.
Question
Now I would like to step back from this solution (ie, reflect the modifications on the initial Module project) and chose a better dependency management strategy.
What I can think of
create a new branch update on Module git project, copy-paste the modified version, commit it and use git diff to check the differences with branch master
use one or a combination of these three approaches (but I don't know how to chose)
git submodules
git subtrees
C++20 modules
Your Module project literally is a Git submodule: an independently-updated history, from which your MainProject builds use specific revisions.
Developing MainProject led to add features and modify interfaces in the Module local copy. Now there are two diverging Module
Quickest: from a clean checkout of your current MainProject revision,
git tag slice $(git commit-tree -m slice #:path/to/Module)
git rm -rf path/to/Module
git submodule add u://r/l path/to/Module
git push path/to/Module slice
cd path/to/Module
git read-tree -um slice
git commit -m 'Module content from MainProject'
and now you've got your content and ancestry looking serviceable, and you can add labels and push it wherever it needs to go, e.g. git checkout -b MainProjectModule; git push -u origin MainProjectModule
If you've got a long history of Module changes in your main project that you want to preserve in the Module history proper, it's doable, and even fairly efficient, but you'll need to adapt some history surgery to achieve it, instead of tagging a nonce commit, tag the submodule commit that command produces and merge that rather than just adding its tip content as a new commit.
I am wondering if anyone has experience working on Django projects in a small team (3 in my case), using Git source control management.
The project is hosted on a development server, which is why I am having such a problem. Developers can't see if their code works until they commit their changes to their local repository, then push those changes to the server. Even then, however, git doesn't seem to be updating the files inside the directory holding the repository on the server - probably because it only stores the changes to save space.
We are beginning to tread on each other's toes when working on this project, so some kind of version control is required - but I just can't figure out an solution.
If anyone has overcome a similar problem I'd love to hear how it can be done.
When pushing to a remote repository, best results are when the remote repository is a "bare" repository with no working directory. It sounds like you have a working directory on the remote repository, which will not be updated by Git when doing a push.
For your situation, I would recommend that developers have their own testing environment that they can test against locally before having to push their code anywhere else. Having one central location where everybody needs to push their work before they can even try it will lead to much pain and suffering.
For deployment, I would recommend pushing to a central "bare" repository, then having a process where the deployment server pulls the latest code from the central repository into its working directory.
When you push to a (shared) git repository, it doesn't update that repository's working files. Basically because the working files might be dirty and in that case you'd have to merge--- and for that you need to have full shell access there, which may not be the case in general.
If you want to have the most recent "master" of the shared repo checked out somewhere, you can arrange for that by writing a post-update hook. I'll give an example of one below that I use to check out the "ui" subdirectory and make it available to Apache.
However, I will say that I think your process could be improved. Developers generally need personal servers that they can test on before pushing to a shared point: otherwise that shared repo is likely to be hideously unreliable. Consider, if I push a change to it and it doesn't work, is that my change that broke it or a side-effect of someone else's?
OK, I use this as a post-update hook:
#!/bin/sh
# Should be run from a Git repository, with a set of refs to update from on the command line.
# This is the post-update hook convention.
info() {
echo "post-update: $#"
}
die() {
echo "post-update: $#" >&2
exit 1
}
output_dir=..
for refname in "$#"; do
case $refname in
refs/heads/master)
new_tree_id=$(git rev-parse $refname:ui)
new_dir="$output_dir/tree-$new_tree_id"
if [ ! -d "$new_dir" ]; then
info "Checking out UI"
mkdir "$new_dir"
git archive --format=tar $new_tree_id | ( cd $new_dir && tar xf - )
fi
prev_link_target=$(readlink $output_dir/current)
if [ -n "$prev_link_target" -a "$prev_link_target" = "tree-$new_tree_id" ]; then
info "UI unchanged"
else
rm -f $output_dir/current
ln -snf "tree-$new_tree_id" "$output_dir/current"
info "UI updated"
title=$(git show --quiet --pretty="format:%s" "$refname" | \
sed -e 's/[^A-Za-z][^A-Za-z]*/_/g')
date=$(git show --quiet --pretty="format:%ci" "$refname" | \
sed -e 's/\([0-9]*\)-\([0-9]*\)-\([0-9]*\) \([0-9]*\):\([0-9]*\):\([0-9]*\) +0000/\1\2\3T\4\5\6Z/')
ln -s "tree-$new_tree_id" "$output_dir/${date}__${title}"
fi
;;
esac
done
As mentioned, this just checks out the "ui" subdirectory. That's the ":ui" bit setting new_tree_id. Just take the ":ui" out (or change to "^{tree}") to check out everything.
Checkouts go in the directory containing the git repo, controlled by output_dir. The script expects to be running inside the git repo (which in turn is expected to be bare): this isn't very clean.
Checkouts are put into "tree-XXXX" directories and a "current" symlink managed to point to the most recent. This makes the change from one to another atomic, although it's unlikely to take so long that it matters. It also means reverts reuse the old files. And it also means it chews up disk space as you keep pushing revisions...
Had the same problem, also working with django.
Agree to testing locally prior to deployment, as already mentioned.
You can then push the local version to a new branch on the server. Then you do a merge with this branch and the master. After this you'll see the updated files.
If you accidentally pushed to the master branch, then you can do a git reset --hard. However all changes not commited in the current working branch will be lost. So take care.
I use git to interface with an SVN repository. I have several git branches for the different projects I work on.
Now, whenever I switch from one branch to another using 'git checkout ', all the compiled executables and object files from the previous branch are still there. What I would like to see is that switching from branch A to B results in a tree with all object files and binaries from the last time I worked on branch B.
Is there a way to handle this without creating multiple git repositories?
Update: I understand that executables and binaries should not end up in the repository. I'm a bit disappointed in the fact that all the branching stuff in git is useless to me, as it turns out I'll have to clone my proxy git repository for every branch I want to start. Something I already did for SVN and hoped to avoid with git. Of course, I don't have to do it, but it would result in me doing a new make most of the time after switching between branches (not fun).
What you want is a full context, not just the branch... which is generally out of scope for a version control tool. The best way to do that is to use multiple repositories.
Don't worry about the inefficiency of that though... Make your second repository a clone of the first. Git will automatically use links to avoid having multiple copies on disk.
Here's a hack to give you want you want
Since you have separate obj directories, you could modify your Makefiles to make the base location dynamic using something like this:
OBJBASE = `git branch --no-color 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/\1\//'`
OBJDIR = "$(OBJBASE).obj"
# branch master: OBJBASE == "master/", OBJDIR == "master/.obj"
# non-git checkout: OBJBASE == "", OBJDIR == ".obj"
That will but your branch name into OBJBASE, which you can use to build your actual objdir location from. I'll leave it to you to modify it to fit your environment and make it friendly to non-git users of your Makefiles.
This is not git or svn specific - you should have your compiler and other tools direct the output of intermediate files like .o files to directories that are not under version control.
To keep multiple checkouts of the same repo, you can use git --work-tree.
For example,
mkdir $BRANCH.d
GIT_INDEX_FILE=$BRANCH.index git --work-tree $BRANCH.d checkout $BRANCH
You could set your IDE compiler to generate all private temporary files (.class and so on) in <output>\branchName\....
By configuration your compilation setting branch by branch, you can register the name of the branch in the output directory path.
That way, even if though private files remain when you git checkout, your project on the new branch is ready to go.
In the contrib/ directory of the git distribution, there is a script called git-new-workdir that allows you to checkout multiples branches in different directories without cloning your repository.
Those files aren't tracked by Git or Subversion, so they're left alone on the assumption that they are of some use to you.
I just do my checkouts in different directories. Saves me the trouble of doing cleanup.
A make clean should not be necessary because files that are different between different branches get checked out with the actual date!!!
This means that if your Makefile is correct, only those object-files, libs and executables are compiled again that really changed because of the checkout. Which is exactly the reason a makefile is there in the first place.
The exception is if you need to switch compiler options or even compilers in different branches. In that case probably git-new-workdir is the best solution.
If the compiled executables are files that have been checked in
then git stash solves the problem.
[compile]
git stash save "first branch"
git checkout other_branch
[Fiddle with your code]
[compile]
git stash save "second branch"
git checkout first_branch
git stash apply [whatever index your "first branch" stash has]
# alternatively git stash pop [whatever index...]
If the compiled executables are files that have not and will not be checked in
then simply add them to .gitignore