Dynamically-created 'zip' command not excluding directories properly - regex

I'm the author of a utilty that makes compressing projects using zip a bit easier, especially when you have to compress regularly, such as for updating projects submitted to an application store (like Chrome's Web Store).
I'm attempting to make quite a few improvements, but have run into an issue, described below.
A Quick Overview
My utility's command format is similar to command OPTIONS DEST DIR1 {DIR2 DIR3 DIR4...}. It works by running zip -r DEST.zip DIR1; a fairly simple process. The benefit to my utility, however, is the ability to use a predetermined file (think .gitignore) to ignore specific files/directories, or files/directories which match a pattern.
It's pretty simple -- if the "ignorefile" exists in a target directory (DIR1, DIR2, DIR3, etc), my utility will add exclusions to the zip -r DEST.zip DIR1 command using the pattern -x some_file or -x some_dir/*.
The Issue
I am running into an issue with directory exclusion, however, and I can't quite figure out why (this is probably be because I am still quite the sh novice). I'll run through some examples:
Let's say that I want to ignore two things in my project directory: .git/* and .gitignore. Running command foo.zip project_dir builds the following command:
zip -r foo.zip project -x project/.git/\* -x project/.gitignore
Woohoo! Success! Well... not quite.
In this example, .gitignore is not added to the compressed output file, foo.zip. The directory, .git/*, and all of it's subdirectories (and files) are added to the compressed output file.
Manually running the command:
zip -r foo.zip project_dir -x project/.git/\* -x project/.gitignore
Works as expected, of course, so naturally I am pretty puzzled as to why my identical, but dynamically-built command, does not work.
Attempted Resolutions
I have attempted a few different methods of resolving this to no avail:
Removing -x project/.git/\* from the command, and instead adding each subdirectory and file within that directory, such as -x project/.git/config -x project/.git/HEAD, etc (including children of subdirectories)
Removing the backslash before the asterisk, so that the resulting exclusion option within the command is -x project/.git/*
Bashing my head on the keyboard in angst (I'm really surprised this didn't work, it usually does)
Some notes
My utility uses /bin/sh; I would prefer to keep it that way for maximum compatibility.
I am aware of the git archive feature -- my use of .git/* and .gitignore in the above example is simply as an example; my utility is not dependent on git nor is used exclusively for projects which are git repositories.

I suspected the problem would be in the evaluation of the generated command, since you said the same command when executed directly did right.
So as the comment section says, I think you already found the correct solution. This happens because if you run that variable directly, some things like globs can be expanded directly, instead of passed to the command. And arguments may be messed up, depending on the situation.
Yes, in that case:
eval $COMMAND
is the way to go.

Related

How to use docker to test multiple compiler versions

What is the idiomatic way to write a docker file for building against many different versions of the same compiler?
I have a project which tests against a wide-range of versions of different compilers like gcc and clang as part of a CI job. At some point, the agents for the CI tasks were updated/changed, resulting in newer jobs failing -- and so I've started looking into dockerizing these builds to try to guarantee better reliability and stability.
However, I'm having some difficulty understanding what a proper and idiomatic approach is to producing build images like this without causing a large amount of duplication caused by layers.
For example, let's say I want to build using the following toolset:
gcc 4.8, 4.9, 5.1, ... (various versions)
cmake (latest)
ninja-build
I could write something like:
# syntax=docker/dockerfile:1.3-labs
# Parameterizing here possible, but would cause bloat from duplicated
# layers defined after this
FROM gcc:4.8
ENV DEBIAN_FRONTEND noninteractive
# Set the work directory
WORKDIR /home/dev
COPY . /home/dev/
# Install tools (cmake, ninja, etc)
# this will cause bloat if the FROM layer changes
RUN <<EOF
apt update
apt install -y cmake ninja-build
rm -rf /var/lib/apt/lists/*
EOF
# Default command is to use CMak
CMD ["cmake"]
However, the installation of tools like ninja-build and cmake occur after the base image, which changes per compiler version. Since these layers are built off of a different parent layer, this would (as far as I'm aware) result in layer duplication for each different compiler version that is used.
One alternative to avoid this duplication could hypothetically be using a smaller base image like alpine with separate installations of the compiler instead. The tools could be installed first so the layers remain shared, and only the compiler changes as the last layer -- however this presents its own difficulties, since it's often the case that certain compiler versions may require custom steps, such as installing certain keyrings.
What is the idiomatic way of accomplishing this? Would this typically be done through multiple docker files, or a single docker file with parameters? Any examples would be greatly appreciated.
I would separate the parts of preparing the compiler and doing the calculation, so the source doesn't become part of the docker container.
Prepare Compiler
For preparing the compiler I would take the ARG approach but without copying the data into the container. In case you wanna fast retry while having enough resources you could spin up multiple instances the same time.
ARG COMPILER=gcc:4.8
FROM ${COMPILER}
ENV DEBIAN_FRONTEND noninteractive
# Install tools (cmake, ninja, etc)
# this will cause bloat if the FROM layer changes
RUN <<EOF
apt update
apt install -y cmake ninja-build
rm -rf /var/lib/apt/lists/*
EOF
# Set the work directory
VOLUME /src
WORKDIR /src
CMD ["cmake"]
Build it
Here you have few options. You could either prepare a volume with the sources or use bind mounts together with docker exec like this:
#bash style
for compiler in gcc:4.9 gcc:4.8 gcc:5.1
do
docker build -t mytag-${compiler} --build-arg COMPILER=${compiler} .
# place to clean the target folder
docker run -v $(pwd)/src:/src mytag-${compiler}
done
And because the source is not part of the docker image you don't have bloat. You can also have two mounts, one for a readonly source tree and one for the output files.
Note: If you remove the CMake command you could also spin up the docker containers in parallel and use docker exec to start the build. The downside of this is that you have to take care of out of source builds to avoid clashes on the output folder.
put an ARG before the FROM and then invoke the ARG as the FROM
so:
ARG COMPILER=gcc:4.8
FROM ${COMPILER}
# rest goes here
then you
docker build . -t test/clang-8 --build-args COMPILER=clang-8
or similar.
If you want to automate just make a list of compilers and a bash script looping over the lines in your file, and paste the lines as inputs to the tag and COMPILER build args.
As for Cmake, I'd just do:
RUN wget -qO- "https://cmake.org/files/v3.23/cmake-3.23.1-linux-"$(uname -m)".tar.gz" | tar --strip-components=1 -xz -C /usr/local
When copying, I find it cleaner to do
WORKDIR /app/build
COPY . .
edit: formatting
As far as I know, there is no way to do that easily and safely. You could use a RUN --mount=type=cache, but the documentation clearly says that:
Contents of the cache directories persist between builder invocations without invalidating the instruction cache. Cache mounts should only be used for better performance. Your build should work with any contents of the cache directory as another build may overwrite the files or GC may clean it if more storage space is needed.
I have not tried it but I guess the layers are duplicated anyway, you just save time, assuming the cache is not emptied.
The other possible solution you have is similar to the one you mention in the question: starting with the tools installation and then customizing it with the gcc image. Instead of starting with an alpine image, you could start FROM scratch. scratch is basically the empty image, you could COPY the files generated by
RUN <<EOF
apt update
apt install -y cmake ninja-build
rm -rf /var/lib/apt/lists/*
EOF
Then you COPY the entire gcc filesystem. However, I am not sure it will work because the order of the initial layers is now reversed. This means that some files that were in the upper layer (coming from tools) now are in the lower layer and could be overwritten. In the comments, I asked you for a working Dockerfile because I wanted to try this out before answering. If you want, you can try this method and let us know. Anyway, the first step is extracting the files created from the tools layer.
How to extract changes from a layer?
Let's consider this Dockerfile and build it with docker build -t test .:
FROM debian:10
RUN apt update && apt install -y cmake && ( echo "test" > test.txt )
RUN echo "new test" > test.txt
Now that we have built the test image, we should find 3 new layers. You mainly have 2 ways to extract the changes from each layer:
the first is docker inspecting the image and then find the ids of the layers in the /var/lib/docker folder, assuming you are on Linux. Each layer has a diff subfolder containing the changes. Actually, I think it is more complex than this, that is why I would opt for...
skopeo: you can install it with apt install skopeo and it is a very useful tool to operate on docker images. The command you are interested in is copy, that extracts the layers of an image and export them as .tar:
skopeo copy docker-daemon:{image_name}:latest "dir:/home/test_img"
where image_name is test in this case.
Extracting layer content with Skopeo
In the specified folder, you should find some tar files and a configuration file (look at the skopeo copy command output and you will know which one is that). Then extract each {layer}.tar in a different folder and you are done.
Note: to find the layer containing your tools just open the configuration file (maybe using jq because it is json) and take the diff_id that corresponds to the RUN instruction you find in the history property. You should understand it once you open the JSON configuration. This is unnecessary if you have a small image that has, for example, debian as parent image and a single RUN instruction containing the tools you want to install.
Get GCC image content
Now that we have the tool layer content, we need to extract the gcc filesystem. we don't need skopeo for this one, but docker export is enough:
create a container from gcc (with the tag you need):
docker create --name gcc4.8 gcc:4.8
export it as tar:
docker export -o gcc4.8.tar gcc4.8
finally extract the tar file.
Putting all together
The final Dockerfile could be something like:
FROM scratch
COPY ./tools_layer/ /
COPY ./gcc_4.x/ /
In this way, the tools layer is always reused (unless you change the content of that folder, of course), but you can parameterize the gcc_4.x with the ARG instruction for example.
Read carefully: all of this is not tested but you might encounter 2 issues:
the gcc image overwrites some files you have changed in the tools layer. You could check if this happens by computing the diff between the gcc layer folder and the tools layer folder. If it happens, you can only keep track of that file/s and add it/them in the dockerfile after the COPY ./gcc ... with another COPY.
When in the upper layer a file is removed, docker marks that file with a .wh extension (not sure if it is different with skopeo). If in the tools layer you delete a file that exists in the gcc layer, then that file will not be deleted using the above Dockerfile (the COPY ./gcc ... instruction would overwrite the .wh). In this case too, you would need to add an additional RUN rm ... instruction.
This is probably not the correct approach if you have a more complex image that the one you are showing us. In my opinion, you could give this a try and just see if this works out with a single Dockerfile. Obviously, if you have many compilers, each one having its own tools set, the maintainability of this approach could be a real burden. Instead, if the Dockerfile is more or less linear for all the compilers, this might be good (after all, you do not do this every day).
Now the question is: is avoiding layer replication so important that you are willing to complicate the image-building process this much?

Why is the main() function of my program named "test" not getting called? [duplicate]

When running scripts in bash, I have to write ./ in the beginning:
$ ./manage.py syncdb
If I don't, I get an error message:
$ manage.py syncdb
-bash: manage.py: command not found
What is the reason for this? I thought . is an alias for current folder, and therefore these two calls should be equivalent.
I also don't understand why I don't need ./ when running applications, such as:
user:/home/user$ cd /usr/bin
user:/usr/bin$ git
(which runs without ./)
Because on Unix, usually, the current directory is not in $PATH.
When you type a command the shell looks up a list of directories, as specified by the PATH variable. The current directory is not in that list.
The reason for not having the current directory on that list is security.
Let's say you're root and go into another user's directory and type sl instead of ls. If the current directory is in PATH, the shell will try to execute the sl program in that directory (since there is no other sl program). That sl program might be malicious.
It works with ./ because POSIX specifies that a command name that contain a / will be used as a filename directly, suppressing a search in $PATH. You could have used full path for the exact same effect, but ./ is shorter and easier to write.
EDIT
That sl part was just an example. The directories in PATH are searched sequentially and when a match is made that program is executed. So, depending on how PATH looks, typing a normal command may or may not be enough to run the program in the current directory.
When bash interprets the command line, it looks for commands in locations described in the environment variable $PATH. To see it type:
echo $PATH
You will have some paths separated by colons. As you will see the current path . is usually not in $PATH. So Bash cannot find your command if it is in the current directory. You can change it by having:
PATH=$PATH:.
This line adds the current directory in $PATH so you can do:
manage.py syncdb
It is not recommended as it has security issue, plus you can have weird behaviours, as . varies upon the directory you are in :)
Avoid:
PATH=.:$PATH
As you can “mask” some standard command and open the door to security breach :)
Just my two cents.
Your script, when in your home directory will not be found when the shell looks at the $PATH environment variable to find your script.
The ./ says 'look in the current directory for my script rather than looking at all the directories specified in $PATH'.
When you include the '.' you are essentially giving the "full path" to the executable bash script, so your shell does not need to check your PATH variable. Without the '.' your shell will look in your PATH variable (which you can see by running echo $PATH to see if the command you typed lives in any of the folders on your PATH. If it doesn't (as is the case with manage.py) it says it can't find the file. It is considered bad practice to include the current directory on your PATH, which is explained reasonably well here: http://www.faqs.org/faqs/unix-faq/faq/part2/section-13.html
On *nix, unlike Windows, the current directory is usually not in your $PATH variable. So the current directory is not searched when executing commands. You don't need ./ for running applications because these applications are in your $PATH; most likely they are in /bin or /usr/bin.
This question already has some awesome answers, but I wanted to add that, if your executable is on the PATH, and you get very different outputs when you run
./executable
to the ones you get if you run
executable
(let's say you run into error messages with the one and not the other), then the problem could be that you have two different versions of the executable on your machine: one on the path, and the other not.
Check this by running
which executable
and
whereis executable
It fixed my issues...I had three versions of the executable, only one of which was compiled correctly for the environment.
Rationale for the / POSIX PATH rule
The rule was mentioned at: Why do you need ./ (dot-slash) before executable or script name to run it in bash? but I would like to explain why I think that is a good design in more detail.
First, an explicit full version of the rule is:
if the path contains / (e.g. ./someprog, /bin/someprog, ./bin/someprog): CWD is used and PATH isn't
if the path does not contain / (e.g. someprog): PATH is used and CWD isn't
Now, suppose that running:
someprog
would search:
relative to CWD first
relative to PATH after
Then, if you wanted to run /bin/someprog from your distro, and you did:
someprog
it would sometimes work, but others it would fail, because you might be in a directory that contains another unrelated someprog program.
Therefore, you would soon learn that this is not reliable, and you would end up always using absolute paths when you want to use PATH, therefore defeating the purpose of PATH.
This is also why having relative paths in your PATH is a really bad idea. I'm looking at you, node_modules/bin.
Conversely, suppose that running:
./someprog
Would search:
relative to PATH first
relative to CWD after
Then, if you just downloaded a script someprog from a git repository and wanted to run it from CWD, you would never be sure that this is the actual program that would run, because maybe your distro has a:
/bin/someprog
which is in you PATH from some package you installed after drinking too much after Christmas last year.
Therefore, once again, you would be forced to always run local scripts relative to CWD with full paths to know what you are running:
"$(pwd)/someprog"
which would be extremely annoying as well.
Another rule that you might be tempted to come up with would be:
relative paths use only PATH, absolute paths only CWD
but once again this forces users to always use absolute paths for non-PATH scripts with "$(pwd)/someprog".
The / path search rule offers a simple to remember solution to the about problem:
slash: don't use PATH
no slash: only use PATH
which makes it super easy to always know what you are running, by relying on the fact that files in the current directory can be expressed either as ./somefile or somefile, and so it gives special meaning to one of them.
Sometimes, is slightly annoying that you cannot search for some/prog relative to PATH, but I don't see a saner solution to this.
When the script is not in the Path its required to do so. For more info read http://www.tldp.org/LDP/Bash-Beginners-Guide/html/sect_02_01.html
All has great answer on the question, and yes this is only applicable when running it on the current directory not unless you include the absolute path. See my samples below.
Also, the (dot-slash) made sense to me when I've the command on the child folder tmp2 (/tmp/tmp2) and it uses (double dot-slash).
SAMPLE:
[fifiip-172-31-17-12 tmp]$ ./StackO.sh
Hello Stack Overflow
[fifi#ip-172-31-17-12 tmp]$ /tmp/StackO.sh
Hello Stack Overflow
[fifi#ip-172-31-17-12 tmp]$ mkdir tmp2
[fifi#ip-172-31-17-12 tmp]$ cd tmp2/
[fifi#ip-172-31-17-12 tmp2]$ ../StackO.sh
Hello Stack Overflow

Wget returns images in an unknown format (jpg#f=)

I am running the following line:
wget -P "C:\My Web Sites\REGEX" -r --no-parent -A jpg,jpeg https://www.mywebsite.com/directory1/directory2/
and it stops (no errors) without returning more than a small amount of the website (two files). I am then running this:
wget -P "C:\My Web Sites\REGEX" https://www.mywebsite.com/directory1/directory2/ -m
and expecting to see data only from the directory. As a start, I found out that the script downloaded everything from the website as if I gave the https://www.mywebsite.com/ url. Also, the images are returned with an additional string in the extension (e.g. instead of .jpg I get something like .jpg#f=l=q)
Is there anything wrong in my code that causes that? I only want to get the images from the links that are shown in the directory given initially.
If there is nothing I can change, then I want to only download the files that contain .jpg in their names. Then, I have a prepared script in Python that can rename the files to have the original extension. Worst case, I can try Python instead of the CMD in Windows (page scraping)?
Note that --no-parent doesn't work in this case because the images are saved in a different directory. --accept-regex can be used if there is no way to get the correct extension.
PS: I do this thing in order to learn more about the wget options and protect my future hobby website.
UPD: Any suggestions regarding a Python script are welcome.

F# package for Sublime Text Build System absent

I am starting out with F# and trying to get it to work with Sublime Text 3 with a package, https://github.com/fsharp/sublime-fsharp-package. After installing the package using Package Control, I see F# appear as an available language to use in Sublime Text's bottom bar, and syntax highlighting appears to work more or less, from what I can tell, but the build system for F# fails to appear as it should.
So, trying to fix things, I run "build.sh install" and get an error, "Cannot open assembly '.paket/paket.bootstrapper.exe': No such file or directory." I am sort of stuck. Many thanks for any help.
From the comments you've made, you appear to be a little unfamiliar with the Unix underpinnings of OS X. I'll explain those first, then I'll suggest something for you to try that may fix your problem.
Technically, files or directories whose name starts with . are not "reserved for the system" as you put it; they're hidden. Now, it's true that Finder won't allow you to create files or directories whose name starts with ., because Apple didn't want to have to field all the tech-support calls from people who didn't know about the hidden-files feature: "I named my file ... more important stuff for work and now it's gone! Help!!!" But if you're in the Terminal app, then you can easily create files or directories with . as their first letter: mkdir .foo should work. You won't see it when you do ls, but ls -a (a for "all") will show you all files, including hidden files. And you can also do cd .foo and create files inside the hidden .foo directory -- and while the .foo folder won't show up in Finder, it will be perfectly accessible in the Terminal, and to any F# programs you might write.
So when you say that you cloned https://github.com/fsprojects/Paket but it failed to include the .github and .paket directories, I think you just don't know how to see them. You can't see them in the Finder (well, you can if you jump through a couple of hoops but I don't think it's worth the effort), but you can see them with ls -a. Just open your terminal, run cd /Users/Username/Paket, and then run ls -a and I think you'll see that the .paket and .github directories were indeed created by your git clone command.
So what you should probably try is this:
Go to https://github.com/fsprojects/Paket/releases/latest
Download the paket.bootstrapper.exe and paket.exe files. Put them in /Users/Username/Downloads (or wherever the default OS X Downloads directory is if it's different -- just as long as it's somewhere where you can find them easily).
Open the Terminal app.
Go to the directory where you've unpacked the Sublime Text 3 package. I.e., in the Terminal app, run cd /Users/Username/Library/Application\ Support/Sublime\ Text\ 3/Packages/sublime-fsharp-package-master.
Run ls -a and see if there's a .paket directory.
If it does not exist, run mkdir .paket.
Now do cd .paket so you're in the hidden .paket directory under sublime-fsharp-package-master.
Now do ls and see if there's a paket.bootstrapper.exe file.
If it doesn't exist, then copy in the .exe files you downloaded earlier:
cp /Users/Username/Downloads/paket.bootstrapper.exe .
cp /Users/Username/Downloads/paket.exe .
Important: Now do cd .. to go back up to the /Users/Username/Library/Application\ Support/Sublime\ Text\ 3/Packages/sublime-fsharp-package-master/ directory.
Now instead of running /Users/Username/Library/Application\ Support/Sublime\ Text\ 3/Packages/sublime-fsharp-package-master/build.sh install, try running it as ./build.sh install. (And also try ./build.sh Install, since I'm pretty sure the capital I is necessary).
(BTW, If you're not familiar with the syntax that I used in steps 9, 10 and 11, where I used a single . or two dots .. in commands, those are a long-standing Unix idiom: . means "the current directory", and .. means "the parent directory".)
I just looked at the build.sh script that you've been running, and it seems to assume that you've done a cd into the package's base directory (the sublime-fsharp-package-master directory) before running the script. So that could explain why it was failing: you were running it from a different directory, rather than doing a cd first. Hence why I marked step 10 as important: I think that was the root cause of the problem.

Using two asterisks to add a file in git

I want to add a file which has a unique file name but a long preceding path (e.g. a/b/c/d/filename.java). Normally I would add this to my repository by doing
git add *filename.java.
However I have also done this before:
git add a/b/c/d/filename*
So I tried to combine the two:
git add *filename*
but this does something weird. It adds every untracked file. I can see possible reasons for failure but they all should occur in one of the previous two commands so I don't know why this is happening.
My question isn't so much about how to add a file to a git repository with just its file name (although that would be useful).
My question is what is my misunderstanding of the * operation which makes me think the above should work.
Info:
I am using Git Bash for Windows, which is based on minGW.
You're looking at globs
(not regular expressions, which are a different pattern-matching language), and they're expanded by your shell, not by git.
If you want to see how they're going to match, just pass the same glob to another command, eg.
$ ls -d *filename.java
vs
$ ls -d *filename*
(I've just added the -d so ls doesn't show the contents of any directories that match)
Since you're using git bash, and it's possible that glob expansion behaves differently from a regular shell, try
$ git add --dry-run --verbose -- *filename*
for example: this should show you how it really expands the glob and what effect that has.
Note the -- ... if you're using globs that might match a filename with a leading -, it's important to make sure git knows it's a filename and not an option.
Unfortunately, this will only show you the files which both match the glob, and have some difference between the index and working copy.
Answer from author:
The dry run helped a lot, here is what I found:
I was forgetting about the bin folder which I haven't added, so when I performed the dry run I realised it was finding two matches: filename.java and filename.class. When I changed the glob to *filename.j* it worked.
My next step was to remove the .class and try the command again: it worked! It is still unexplained why git bash added everything when it found two matches... since the dry run behaves differently from the actual run I think there must be a bug, but I think that discussion is to be held elsewhere (unless somebody thinks it isn't a bug).
You could try with git add ./**/*.java
Note: I tested with zsh, it should also work for bash as well.