Kaniko do not use cache options - build

I am trying to use the --cache-repo option of the kaniko executor but I see that it does not use the cache that I saved in ECR/AWS and the gitlab log returns this;
Checking for cached layer [MASKED]/dev-cache:627d56ef7c151b98c02c0de3d3d0d9a5bc8d538b1b1d58632ef977f4501b48f4...
INFO[0521] No cached layer found for cmd COPY --from=build /../../../../..............
I have rebuilt the image with the same tag and the code has not changed and it is still taking the same time...
The version of kaniko I am using is the following gcr.io/kaniko-project/executor:v1.9.1
These are the flags I use in kaniko:
/kaniko/executor --cache=true \
--cache-repo "${URL_ECR}/dev-cache" \
--cache-copy-layers \
--single-snapshot \
--context "${CI_PROJECT_DIR}" ${BUILD_IMAGE_EXTRA_ARGS} \
--dockerfile "${CI_PROJECT_DIR}/Dockerfile" \
--destination "${IMAGE_NAME}:${IMAGE_TAG}" \
--destination "${IMAGE_NAME}:latest" \
--skip-unused-stages \
--snapshotMode=redo \
--use-new-run
Do you have any ideas?

Successfully resolved issue by removing the flags: --cache-copy-layers and --single-snapshot, and adding the flag: --cleanup

Related

How can I reject files named '!' with wget?

I'm using wget to recursively download my university's pages for later analysis and am filtering lots of extensions.
Here's a mwe with the relevant function:
#!/bin/sh
unwanted_extensions='*.apk,*.asc,*.asp,*.avi,*.bat,*.bib,*.bmp,*.bz2,*.c,*.cdf,*.cgi,*.class,*.cpp,*.crt,*.csp,*.css,*.cur,*.dat,*.dll,*.dvi,*.dwg,*.eot,*.eps,*.epub,*.exe,*.f,*.flv,*.for,*.ggb,*.gif,*.gpx,*.gz,*.h,*.heic,*.hpp,*.hqx,*.htc,*.ico,*.jfif,*.jpe,*.jpeg,*.jpg,*.js,*.lib,*.lnk,*.ly,*.m,*.m4a,*.m4v,*.mdb,*.mht,*.mid,*.mp3,*.mp4,*.mpeg,*.mpg,*.mso,*.odb,*.ogv,*.otf,*.out,*.pdb,*.pdf,*.php,*.plot,*.png,*.ps,*.psz,*.py,*.rar,*.sav,*.sf3,*.sgp,*.sh,*.sib,*.svg,*.swf,*.tex,*.tgz,*.tif,*.tiff,*.tmp,*.ttf,*.txt,*.wav,*.webm,*.webmanifest,*.webp,*.wmf,*.woff,*.woff2,*.wxm,*.wxmx,*.xbm,*.xml,*.xps,*.zip'
unwanted_regex='/([a-zA-Z0-9]+)$'
wget_custom ()
{
link="$1"
wget \
--recursive -e robots=off --level=inf --quiet \
--ignore-case --adjust-extension --convert-file-only \
--reject "$unwanted_extensions" \
--reject-regex "$unwanted_regex" --regex-type posix \
"$link"
}
wget_custom "$1"
It works nicely and filters most of the stuff. However, these webs serve many pdf and image files named ! (e.g. biologiacelular.ugr.es/pages/planoweb/!) which I don't need and want to reject. Here's what i've tried but hasn't worked:
Appending ,! to unwanted_extensions
Appending ,%21 to unwanted_extensions
Changing unwanted_regex to '/([a-zA-Z0-9!]+)$'
Changing unwanted_regex to '/([a-zA-Z0-9\!]+)$'
Adding nother --reject-regex '/!$
Adding nother --reject-regex '/\!$
None of these work and I'm out of ideas. How can I filter the ! files? Thank you!

/bin/sh: jlink: not found. command '/bin/sh -c jlink' returned a non-zero code: 127

the dockerfile used -
FROM azul/zulu-openjdk-alpine:11 as jdk
RUN jlink \
--module-path /usr/lib/jvm/*/jmods/ \
--verbose \
--add-modules java.base,jdk.unsupported,java.sql,java.desktop \
--compress 2 \
--no-header-files \
--no-man-pages \
--output /opt/jdk-11-minimal
FROM alpine:3.10
ENV JAVA_HOME=/opt/jdk-11-minimal
ENV PATH=$PATH:/opt/jdk-11-minimal/bin
COPY --from=jdk /opt/jdk-11-minimal /opt/jdk-11-minimal
why jlink can't be found in azul/zulu-openjdk-alpine:11?
The simple answer is jlink is not on the PATH so can't be found.
If you change the RUN line to
RUN /usr/lib/jvm/zulu11/bin/jlink
then it can be found.
However, you still have an error using the wildcard in the module path. Change this to
--module-path /usr/lib/jvm/zulu11/jmods/
and the docker command will complete successfully.
Please, use $JAVA_HOME/bin/jlink.
For historical reasons $JAVA_HOME/bin is not included in PATH, so you need to state it directly.
I had the same problem. And it's an issue in the image https://github.com/zulu-openjdk/zulu-openjdk/issues/66
I tried with the version azul/zulu-openjdk-alpine:11.0.7-11.39.15 and it worked

TPU Based Tuning for CloudML

Are TPUs supported for distributed hyperparameter search? I'm using the tensor2tensor library, which supports CloudML for hyperparameter search, i.e., the following works for me to conduct hyperparameter search for a language model on GPUs:
t2t-trainer \
--model=transformer \
--hparams_set=transformer_tpu \
--problem=languagemodel_lm1b8k_packed \
--train_steps=100000 \
--eval_steps=8 \
--data_dir=$DATA_DIR \
--output_dir=$OUT_DIR \
--cloud_mlengine \
--hparams_range=transformer_base_range \
--autotune_objective='metrics-languagemodel_lm1b8k_packed/neg_log_perplexity' \
--autotune_maximize \
--autotune_max_trials=100 \
--autotune_parallel_trials=3
However, when I try to utilize TPUs as in the following:
t2t-trainer \
--problem=languagemodel_lm1b8k_packed \
--model=transformer \
--hparams_set=transformer_tpu \
--data_dir=$DATA_DIR \
--output_dir=$OUT_DIR \
--train_steps=100000 \
--use_tpu=True \
--cloud_mlengine_master_type=cloud_tpu \
--cloud_mlengine \
--hparams_range=transformer_base_range \
--autotune_objective='metrics-languagemodel_lm1b8k_packed/neg_log_perplexity' \
--autotune_maximize \
--autotune_max_trials=100 \
--autotune_parallel_trials=5
I get the error:
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://ml.googleapis.com/v1/projects/******/jobs?alt=json returned "Field: master_type Error: The specified machine type for masteris not supported in TPU training jobs: cloud_tpu"
One of the authors of the tensor2tensor library here. Yup, this was indeed a bug and is now fixed. Thanks for spotting. We'll release a fixed version on PyPI this week, and you can of course clone and install locally from master until then.
The command you used should work just fine now.
I believe there is a bug in the tensor2tensor library:
https://github.com/tensorflow/tensor2tensor/blob/6a7ef7f79f56fdcb1b16ae76d7e61cb09033dc4f/tensor2tensor/utils/cloud_mlengine.py#L281
It's the worker_type (and not the master_type) that needs to be set for Cloud ML Engine.
To answer the original question though, yes, HP Tuning should be supported for TPUs, but the error above is orthogonal to that.

bitbake grpc cross compile/configure failing with error c-ares::cares references the file /usr/lib/libcares.so.2.2.0

I am having issues with finding c-ares dependencies when building grpc in open-embedded.Error in the log when looking for the dependency c-ares during configure is shown in the log as -
--
Found ZLIB: ....../poky/build/tmp-glibc/sysroots/arm7/usr/lib/libz.so (found version "1.2.8")
CMake Error at ....../poky/build/tmp-glibc/sysroots/arm7/usr/lib/cmake/c-ares/c-ares-targets.cmake:70 (message):
The imported target "c-ares::cares" references the file
"/usr/lib/libcares.so.2.2.0"
but this file does not exist. Possible reasons include:
* The file was deleted, renamed, or moved to another location.
* An install or uninstall procedure did not complete successfully.
* The installation package was faulty and contained
"/home/...../poky/build/tmp-glibc/sysroots/arm7/usr/lib/cmake/c-ares/c-ares-targets.cmake"
but not all the files it references.
--
Issue seem to be how cmake has configured the import prefix for c-ares,which is configured as below in file - poky/build/tmp-glibc/sysroots/arm7/usr/lib/cmake/c-ares/c-ares-targets.cmake. I believe it should be the path into the target staging directory
set(_IMPORT_PREFIX "/usr")
Can someone please help me identify the issue here? what needs to be configured in the c-ares recipe in order to get the _IMPORT_PREFIX right??
Any help is much appreciated.
Thanks
I came across this issue today when building a newer gRPC in an older (daisy) BitBake environment. The solutions I came to were either backporting this upstream change to the cmake.bbclass or hacking in the updated variable definitions in a .bbappend to the cmake invocation by way of the EXTRA_OECMAKE variable.
I chose the latter, as I only seemed to need this for c-ares, and wanted to limit my impact. I did not end up digging into the difference between how c-ares and other gRPC dependencies (e.g. gflags) generate CMake export targets files. I assume there's some way the ultimate target paths are generated within the respective projects' CMakeLists.txt files.
diff --git a/meta/classes/cmake.bbclass b/meta/classes/cmake.bbclass
index b18152a8ed..5203d8aca1 100644
--- a/meta/classes/cmake.bbclass
+++ b/meta/classes/cmake.bbclass
## -108,15 +108,15 ## cmake_do_configure() {
${OECMAKE_SITEFILE} \
${OECMAKE_SOURCEPATH} \
-DCMAKE_INSTALL_PREFIX:PATH=${prefix} \
- -DCMAKE_INSTALL_BINDIR:PATH=${bindir} \
- -DCMAKE_INSTALL_SBINDIR:PATH=${sbindir} \
- -DCMAKE_INSTALL_LIBEXECDIR:PATH=${libexecdir} \
+ -DCMAKE_INSTALL_BINDIR:PATH=${#os.path.relpath(d.getVar('bindir', True), d.getVar('prefix', True))} \
+ -DCMAKE_INSTALL_SBINDIR:PATH=${#os.path.relpath(d.getVar('sbindir', True), d.getVar('prefix', True))} \
+ -DCMAKE_INSTALL_LIBEXECDIR:PATH=${#os.path.relpath(d.getVar('libexecdir', True), d.getVar('prefix', True))} \
-DCMAKE_INSTALL_SYSCONFDIR:PATH=${sysconfdir} \
- -DCMAKE_INSTALL_SHAREDSTATEDIR:PATH=${sharedstatedir} \
+ -DCMAKE_INSTALL_SHAREDSTATEDIR:PATH=${#os.path.relpath(d.getVar('sharedstatedir', True), d. getVar('prefix', True))} \
-DCMAKE_INSTALL_LOCALSTATEDIR:PATH=${localstatedir} \
- -DCMAKE_INSTALL_LIBDIR:PATH=${libdir} \
- -DCMAKE_INSTALL_INCLUDEDIR:PATH=${includedir} \
- -DCMAKE_INSTALL_DATAROOTDIR:PATH=${datadir} \
+ -DCMAKE_INSTALL_LIBDIR:PATH=${#os.path.relpath(d.getVar('libdir', True), d.getVar('prefix', True))} \
+ -DCMAKE_INSTALL_INCLUDEDIR:PATH=${#os.path.relpath(d.getVar('includedir', True), d.getVar('prefix', True))} \
+ -DCMAKE_INSTALL_DATAROOTDIR:PATH=${#os.path.relpath(d.getVar('datadir', True), d.getVar('prefix', True))} \
-DCMAKE_INSTALL_SO_NO_EXE=0 \
-DCMAKE_TOOLCHAIN_FILE=${WORKDIR}/toolchain.cmake \
-DCMAKE_VERBOSE_MAKEFILE=1 \

Hadoop: compress file in HDFS?

I recently set up LZO compression in Hadoop. What is the easiest way to compress a file in HDFS? I want to compress a file and then delete the original. Should I create a MR job with an IdentityMapper and an IdentityReducer that uses LZO compression?
For me, it's lower overhead to write a Hadoop Streaming job to compress files.
This is the command I run:
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-0.20.2-cdh3u2.jar \
-Dmapred.output.compress=true \
-Dmapred.compress.map.output=true \
-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec \
-Dmapred.reduce.tasks=0 \
-input <input-path> \
-output $OUTPUT \
-mapper "cut -f 2"
I'll also typically stash the output in a temp folder in case something goes wrong:
OUTPUT=/tmp/hdfs-gzip-`basename $1`-$RANDOM
One additional note, I do not specify a reducer in the streaming job, but you certainly can. It will force all the lines to be sorted which can take a long time with a large file. There might be a way to get around this by overriding the partitioner but I didn't bother figuring that out. The unfortunate part of this is that you potentially end up with many small files that do not utilize HDFS blocks efficiently. That's one reason to look into Hadoop Archives
I suggest you write a MapReduce job that, as you say, just uses the Identity mapper. While you are at it, you should consider writing the data out to sequence files to improve performance loading. You can also store sequence files in block-level and record-level compression. Yo should see what works best for you, as both are optimized for different types of records.
The streaming command from Jeff Wu along with a concatenation of the compressed files will give a single compressed file. When a non java mapper is passed to the streaming job and the input format is text streaming outputs just the value and not the key.
hadoop jar contrib/streaming/hadoop-streaming-1.0.3.jar \
-Dmapred.reduce.tasks=0 \
-Dmapred.output.compress=true \
-Dmapred.compress.map.output=true \
-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec \
-input filename \
-output /filename \
-mapper /bin/cat \
-inputformat org.apache.hadoop.mapred.TextInputFormat \
-outputformat org.apache.hadoop.mapred.TextOutputFormat
hadoop fs -cat /path/part* | hadoop fs -put - /path/compressed.gz
This is what I've used:
/*
* Pig script to compress a directory
* input: hdfs input directory to compress
* hdfs output directory
*
*
*/
set output.compression.enabled true;
set output.compression.codec org.apache.hadoop.io.compress.BZip2Codec;
--comma seperated list of hdfs directories to compress
input0 = LOAD '$IN_DIR' USING PigStorage();
--single output directory
STORE input0 INTO '$OUT_DIR' USING PigStorage();
Though it's not LZO so it may be a bit slower.
#Chitra
I cannot comment due to reputation issue
Here is everything in one command: Instead of using the second command, you can reduce into one compressed file directly
hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar \
-Dmapred.reduce.tasks=1 \
-Dmapred.output.compress=true \
-Dmapred.compress.map.output=true \
-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec \
-input /input/raw_file \
-output /archives/ \
-mapper /bin/cat \
-reducer /bin/cat \
-inputformat org.apache.hadoop.mapred.TextInputFormat \
-outputformat org.apache.hadoop.mapred.TextOutputFormat
Thus, you gain a lot of space by having only one compress file
For example, let's say i have 4 files of 10MB (it's plain text, JSON formatted)
The map only is giving me 4 files of 650 KB
If I map and reduce I have 1 file of 1.05 MB
I know this is old thread, but if anyone following this thread (like me) it would be useful to know that any of following 2 methods gives you a tab (\t) character at the end of each line
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-0.20.2-cdh3u2.jar \
-Dmapred.output.compress=true \
-Dmapred.compress.map.output=true \
-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec \
-Dmapred.reduce.tasks=0 \
-input <input-path> \
-output $OUTPUT \
-mapper "cut -f 2"
hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar \
-Dmapred.reduce.tasks=1 \
-Dmapred.output.compress=true \
-Dmapred.compress.map.output=true \
-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec \
-input /input/raw_file \
-output /archives/ \
-mapper /bin/cat \
-reducer /bin/cat \
-inputformat org.apache.hadoop.mapred.TextInputFormat \
-outputformat org.apache.hadoop.mapred.TextOutputFormat
From this hadoop-streaming.jar adds x'09' at the end of each line, I found the fix and we need to set following 2 parameters to respecitve delimiter you use (in my case it was ,)
-Dstream.map.output.field.separator=, \
-Dmapred.textoutputformat.separator=, \
full command to execute
hadoop jar <HADOOP_HOME>/jars/hadoop-streaming-2.6.0-cdh5.4.11.jar \
-Dmapred.reduce.tasks=1 \
-Dmapred.output.compress=true \
-Dmapred.compress.map.output=true \
-Dstream.map.output.field.separator=, \
-Dmapred.textoutputformat.separator=, \
-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.Lz4Codec \
-input file:////home/admin.kopparapu/accenture/File1_PII_Phone_part3.csv \
-output file:///home/admin.kopparapu/accenture/part3 \
-mapper /bin/cat \
-reducer /bin/cat \
-inputformat org.apache.hadoop.mapred.TextInputFormat \
-outputformat org.apache.hadoop.mapred.TextOutputFormat
Well, if you compress a single file, you may save some space, but you can't really use Hadoop's power to process that file since the decompression has to be done by a single Map task sequentially. If you have lots of files, there's Hadoop Archive, but I'm not sure it includes any kind of compression. The main use case for compression I can think of is compressing the output of Maps to be sent to Reduces (save on network I/O).
Oh, to answer your question more complete, you'd probably need to implement your own RecordReader and/or InputFormat to make sure the entire file got read by a single Map task, and also it used the correct decompression filter.