How are job archives generated in Flink? - hdfs

When we run Flink on yarn, the finished/terminated/failed jobs are stored in job achieve. For example, we have the following job archives on hdfs. Any pointers on how these job archives are generated and stored on hdfs?
-rw-r--r-- 3 aaaa hdfs 10568 2019-07-09 18:34 /tmp/flink/completed-jobs/f909a4ca58cbf1d233a798f7de9489e0
-rw-r--r-- 3 bbbb hdfs 9966 2019-06-20 22:08 /tmp/flink/completed-jobs/fa1fb72ea43348fa84232e7517ca3c91
-rw-r--r-- 3 cccc hdfs 12487 2019-06-26 20:45 /tmp/flink/completed-jobs/fa2b34566384ec621e0d05a2073b8e90
-rw-r--r-- 3 dddd hdfs 57212 2019-07-16 00:41 /tmp/flink/completed-jobs/fa76acb920eec0880a986fb23fbb9149

Found one related file in Flink repo:
https://github.com/apache/flink/blob/57a2b754f6a5d8844aa35afb511901ad7ee43068/flink-runtime/src/main/java/org/apache/flink/runtime/history/FsJobArchivist.java#L71
HistoryServerArchivist is called from flink/runtime/dispatcher/Dispatcher.java
https://github.com/apache/flink/blob/57a2b754f6a5d8844aa35afb511901ad7ee43068/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L126
#Override
public CompletableFuture<Acknowledge> archiveExecutionGraph(
AccessExecutionGraph executionGraph) {
try {
FsJobArchivist.archiveJob(archivePath, executionGraph.getJobID(), jsonArchivist.archiveJsonWithPath(executionGraph));
return CompletableFuture.completedFuture(Acknowledge.get());
} catch (IOException e) {
return FutureUtils.completedExceptionally(e);
}
}

Related

Connecting to LND Node through a local-running Django Rest API

I am trying to connect to my LND node running on AWS (I know it is not the best case scenario for an LND node but this time I had no other way of doing it) from my local running Django Rest Api. The issue is that it cannot find the admin.macaroon file even though the file is in the mentioned directory. Below I am giving some more detailed information:
view.py
class GetInfo(APIView):
def get(self, request):
REST_HOST = "https://ec2-18-195-111-81.eu-central-1.compute.amazonaws.com"
MACAROON_PATH = "/home/ubuntu/.lnd/data/chain/bitcoin/mainnet/admin.macaroon"
# url = "https://ec2-18-195-111-81.eu-central-1.compute.amazonaws.com/v1/getinfo"
TLS_PATH = "/home/ubuntu/.lnd/tls.cert"
url = f"https//{REST_HOST}/v1/getinfo"
macaroon = codecs.encode(open(MACAROON_PATH, "rb").read(), "hex")
headers = {"Grpc-Metadata-macaroon": macaroon}
r = requests.get(url, headers=headers, verify=TLS_PATH)
return Response(json.loads(r.text))
The node is running with no problem on AWS. This is what I get when I run lncli getinfo:
$ lncli getinfo:
{
"version": "0.15.5-beta commit=v0.15.5-beta",
"commit_hash": "c0a09209782b1c62c3393fcea0844exxxxxxxxxx",
"identity_pubkey": "mykey",
"alias": "020d4da213770890e1c1",
"color": "#3399ff",
"num_pending_channels": 0,
"num_active_channels": 0,
"num_inactive_channels": 0,
"uris": [
....
and the permissions are as below:
$ ls -l
total 138404
-rwxrwxr-x 1 ubuntu ubuntu 293 Feb 6 09:38 admin.macaroon
drwxrwxr-x 2 ubuntu ubuntu 4096 Feb 5 14:48 bin
drwxr-xr-x 6 ubuntu ubuntu 4096 Jan 27 20:17 bitcoin-22.0
drwxrwxr-x 4 ubuntu ubuntu 4096 Feb 1 16:39 go
-rw-rw-r-- 1 ubuntu ubuntu 141702072 Mar 15 2022 go1.18.linux-amd64.tar.gz
drwxrwxr-x 72 ubuntu ubuntu 4096 Feb 1 16:36 lnd
-rw-rw-r-- 1 ubuntu ubuntu 0 Jan 27 20:13 screenlog.0
The error I get is [Errno 2] No such file or directory:'/home/ubuntu/.lnd/data/chain/bitcoin/mainnet/admin.macaroon'
I guess the problem should be how I need to access the node from my API, but I have no idea how to access an EC2 instance from an external api.
Thank you in advance

Does Django save zipped directories as files or what could be going on here?

I'm working with OpenEdX, it has a plugin system, called XBlocks, that in this case allows importing content created by third party "studio apps." This content can be uploaded as a zip file. it is then processed by the following code:
#XBlock.handler
def studio_submit(self, request, _suffix):
self.display_name = request.params["display_name"]
self.width = request.params["width"]
self.height = request.params["height"]
self.has_score = request.params["has_score"]
self.weight = request.params["weight"]
self.icon_class = "problem" if self.has_score == "True" else "video"
response = {"result": "success", "errors": []}
if not hasattr(request.params["file"], "file"):
# File not uploaded
return self.json_response(response)
package_file = request.params["file"].file
self.update_package_meta(package_file)
# First, save scorm file in the storage for mobile clients
if default_storage.exists(self.package_path):
logger.info('Removing previously uploaded "%s"', self.package_path)
default_storage.delete(self.package_path)
default_storage.save(self.package_path, File(package_file))
logger.info('Scorm "%s" file stored at "%s"', package_file, self.package_path)
# Then, extract zip file
if default_storage.exists(self.extract_folder_base_path):
logger.info(
'Removing previously unzipped "%s"', self.extract_folder_base_path
)
recursive_delete(self.extract_folder_base_path)
with zipfile.ZipFile(package_file, "r") as scorm_zipfile:
for zipinfo in scorm_zipfile.infolist():
default_storage.save(
os.path.join(self.extract_folder_path, zipinfo.filename),
scorm_zipfile.open(zipinfo.filename),
)
try:
self.update_package_fields()
except ScormError as e:
response["errors"].append(e.args[0])
return self.json_response(response)
where the code
default_storage.save(
os.path.join(self.extract_folder_path, zipinfo.filename),
scorm_zipfile.open(zipinfo.filename),
)
is the origin of the following (Django) error trace:
cms_1 | File "/openedx/venv/lib/python3.5/site-packages/openedxscorm/scormxblock.py", line 193, in studio_submit
cms_1 | scorm_zipfile.open(zipinfo.filename),
cms_1 | File "/openedx/venv/lib/python3.5/site-packages/django/core/files/storage.py", line 52, in save
cms_1 | return self._save(name, content)
cms_1 | File "/openedx/venv/lib/python3.5/site-packages/django/core/files/storage.py", line 249, in _save
cms_1 | raise IOError("%s exists and is not a directory." % directory)
cms_1 | OSError: /openedx/media/scorm/c154229b568d45128e1098b530267a35/a346b1db27aaa89b89b31e1c3e2a1af04482abad/assets exists and is not a directory.
I posted the issue on github too
exception FileExistsError
Raised when trying to create a file or directory which already exists. Corresponds to errno EEXIST.
I don't really understand what is going on. It's based on a hairball of javascript in layered docker containers, so I can't readily hack&print for extra info.
The only thing I found was that some of the folders in the zip file are written to the docker volume as files instead of directories at the moment the error is thrown. This may however be expected and these files might be rewritten as or changed to directories later (?) on Linux (?).
The error lists the assets folder
root#93f0d2b9667f:/openedx/media/scorm/5e085cbc04e24b3b911802f7cba44296/92b12100be7651c812a1d29a041153db5ba89239# ls -la
total 84
drwxr-xr-x 2 root root 4096 Aug 2 22:17 .
drwxr-xr-x 3 root root 4096 Aug 2 22:17 ..
-rw-r--r-- 1 root root 4398 Aug 2 22:17 adlcp_rootv1p2.xsd
-rw-r--r-- 1 root root 0 Aug 2 22:17 assets
-rw-r--r-- 1 root root 0 Aug 2 22:17 course
-rw-r--r-- 1 root root 14560 Aug 2 22:17 imscp_rootv1p1p2.xsd
-rw-r--r-- 1 root root 1847 Aug 2 22:17 imsmanifest.xml
-rw-r--r-- 1 root root 22196 Aug 2 22:17 imsmd_rootv1p2p1.xsd
-rw-r--r-- 1 root root 1213 Aug 2 22:17 ims_xml.xsd
-rw-r--r-- 1 root root 1662 Aug 2 22:17 index.html
-rw-r--r-- 1 root root 0 Aug 2 22:17 libraries
-rw-r--r-- 1 root root 1127 Aug 2 22:17 log_output.html
-rw-r--r-- 1 root root 481 Aug 2 22:17 main.html
-rw-r--r-- 1 root root 759 Aug 2 22:17 offline_API_wrapper.js
-rw-r--r-- 1 root root 0 Aug 2 22:17 player
-rw-r--r-- 1 root root 1032 Aug 2 22:17 popup.html
root#93f0d2b9667f:/openedx/media/scorm/5e085cbc04e24b3b911802f7cba44296/92b12100be7651c812a1d29a041153db5ba89239# cd assets
bash: cd: assets: Not a directory

Program deleting file of other user

I have two Users. User1 is running a program which tries to delete a file from user2. But my program always return me "permission denied".
When I try to delete the file myself as user1 with the rm command there is no problem. The permission of the files are 775 and my user1 is in the group of user2. This group is also the owner of the files. The permission of the directory in which the files are is 775 too.
For removing the file the program I have written uses the "remove" function from c/c++.
Does anyone have a solution or idea ?
I have asked this question on unix.stackexchange.com before. They have sent me here.
Here is my code:
void deleteFile()
{
if(0 != remove("File1.txt"))
cout<<"Error deleting File: "<<strerror(errno)<<endl;
if(0 != remove("File2.txt"))
cout<<"Error deleting File: "<<strerror(errno)<<endl;
}
i have renamed the files but i know the original paths are correct. i have already tested this
more information:
ok i have runnned the program as user2 and the files have been deleted without any problems.
groups user1
users user2
groups user2
user2 adm www-data plugdev users ftp vsftpd
ls -lah
drwxrwxr-x 7 user2 user2 4.0K Nov 27 14:13 .
drwxrw-r-x 4 user2 user2 4.0K Nov 11 12:34 ..
-rwxrwxr-x 1 user2 user2 50 Nov 12 15:12 File1.txt
-rwxrwxr-x 1 user2 user2 826 Nov 27 14:13 File2.txt
Try running rm and Your command with strace as user1:
strace your_program
strace rm File1.txt File2.txt
You should see, what Your program and rm are doing differently.

Regex that matches [number-n]-WORD but not [number]-WORD

I want to create a shell script, that iterates through folders and deletes folders that match [versionnumber-n] where n > 0
the version number is in a file that's content is like:
MAVEN_VERSION=1.2.7.0-SNAPSHOT
Here's an example:
The file listing is like
drwxrwxr-x 4 jenkins jenkins 4096 Jul 29 10:54 ./
drwxrwxr-x 20 jenkins jenkins 4096 Jul 4 09:20 ../
drwxr-xr-x 2 jenkins jenkins 4096 Jul 23 12:35 1.2.6.0-SNAPSHOT/
drwxr-xr-x 2 jenkins jenkins 4096 Jul 28 23:13 1.2.7.0-SNAPSHOT/
-rw-rw-r-- 1 jenkins jenkins 403 Jul 29 10:11 maven-metadata-local.xml
-rw-r--r-- 1 jenkins jenkins 403 Jul 28 23:13 maven-metadata-mtx-snapshots.xml
-rw-r--r-- 1 jenkins jenkins 40 Jul 28 23:13 maven-metadata-mtx-snapshots.xml.sha1
-rw-r--r-- 1 jenkins jenkins 403 Jul 28 23:13 maven-metadata.xml
-rw-r--r-- 1 jenkins jenkins 32 Jul 28 23:13 maven-metadata.xml.md5
-rw-r--r-- 1 jenkins jenkins 40 Jul 28 23:13 maven-metadata.xml.sha1
-rw-r--r-- 1 jenkins jenkins 186 Jul 28 23:13 resolver-status.properties
Where I want the script to delete the folder 1.2.6.0-SNAPSHOT/ but not 1.2.7.0-SNAPSHOT/. If there where folders like 1.2.5.0-SNAPSHOT/ 1.2.4.0-SNAPSHOT/ them too.
What I have at this point:
.*(?!1.2.7.0)(-SNAPSHOT)
Which unfortunately matches both folders (in the example above)
edit: just hit submit too early ...
With Bash you can just use negation with extended pathname expansion.
shopt -s extglob
rm -fr /dir/1.2.!(7).0-SNAPSHOT
Dry run example:
$ ls -1
1.2.10.0-SNAPSHOT
1.2.5.0-SNAPSHOT
1.2.6.0-SNAPSHOT
1.2.7.0-SNAPSHOT
a
$ echo rm -fr 1.2.!(7).0-SNAPSHOT
rm -fr 1.2.10.0-SNAPSHOT 1.2.5.0-SNAPSHOT 1.2.6.0-SNAPSHOT
See Extended Pattern Matching and Filename Expansion.
How I did it in the end:
if [ -z "$MAVEN_VERSION_SERVER" ]
then
echo "\$MAVEN_VERSION_SERVER NOT set! \n exiting ..."
else
find /var/lib/jenkins/.m2/repository/de/db/mtxbes -mindepth 1 -type d -regex '.*SNAPSHOT' -not -name $MAVEN_VERSION_SERVER | xargs -d '\n' rm -fr
fi
(the $MAVEN_VERSION_SERVER gets set and read with groovy scripts before)

logback skipping log files on AWS EC2

I'm using logback for logging from an app deployed in Tomcat, with a fairly simple setup (see code fragments). We use a RollingFileAppender, with TimeBasedRollingPolicy, set for daily rollover. When running locally, everything appears to be fine. When running in AWS in an EC2 instance, I'm seeing that some log files are missing.
I wrote a really simple app that does nothing but log once per second with a counter, and then a logback config that rolls every minute. For this particular test, we're seeing every third log file is missing.
So, for example, we'll get:
-rw-r--r-- 1 tomcat tomcat 891 May 13 18:46 logtest_tomcat.2014-05-13_1845.0.log.gz
-rw-r--r-- 1 tomcat tomcat 499 May 13 18:47 logtest_tomcat.2014-05-13_1846.0.log.gz
-rw-r--r-- 1 tomcat tomcat 541 May 13 18:49 logtest_tomcat.2014-05-13_1848.0.log.gz
-rw-r--r-- 1 tomcat tomcat 519 May 13 18:50 logtest_tomcat.2014-05-13_1849.0.log.gz
-rw-r--r-- 1 tomcat tomcat 532 May 13 18:52 logtest_tomcat.2014-05-13_1851.0.log.gz
-rw-r--r-- 1 tomcat tomcat 510 May 13 18:53 logtest_tomcat.2014-05-13_1852.0.log.gz
-rw-r--r-- 1 tomcat tomcat 536 May 13 18:55 logtest_tomcat.2014-05-13_1854.0.log.gz
-rw-r--r-- 1 tomcat tomcat 1226 May 13 18:56 logtest_tomcat.2014-05-13_1855.0.log.gz
-rw-r--r-- 1 tomcat tomcat 531 May 13 18:58 logtest_tomcat.2014-05-13_1857.0.log.gz
-rw-r--r-- 1 tomcat tomcat 496 May 13 18:59 logtest_tomcat.2014-05-13_1858.0.log.gz
-rw-r--r-- 1 tomcat tomcat 1244 May 13 19:01 logtest_tomcat.2014-05-13_1900.0.log.gz
-rw-r--r-- 1 tomcat tomcat 496 May 13 19:02 logtest_tomcat.2014-05-13_1901.0.log.gz
-rw-r--r-- 1 tomcat tomcat 514 May 13 19:04 logtest_tomcat.2014-05-13_1903.0.log.gz
-rw-r--r-- 1 tomcat tomcat 500 May 13 19:05 logtest_tomcat.2014-05-13_1904.0.log.gz
-rw-r--r-- 1 tomcat tomcat 522 May 13 19:07 logtest_tomcat.2014-05-13_1906.0.log.gz
The file format is yyyy-mm-dd_HHmm - so you can see that 1847, 1850, 1853, 1856, 1859, 1902, 1905 are all missing.
I've checked the contents - the sequential numbering on the log statements jumps by 60 for the missing logs - so it's not that multiple minutes are being collapsed into a single rolled over log.
We also thought it might be due to our Splunk forwarder - we ran the test both with and without the Splunk forwarder running, and got the same results - every third log file is missing.
Here's the logback appender for this test:
<appender name="daily" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${bc.logs.home}/logtest_tomcat.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<!-- Rollover every minute for this test -->
<fileNamePattern>${bc.logs.home}/logtest_tomcat.%d{yyyy-MM-dd_HHmm}.%i.log.gz</fileNamePattern>
<timeBasedFileNamingAndTriggeringPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
<!-- or whenever the file size reaches 250MB -->
<maxFileSize>250MB</maxFileSize>
</timeBasedFileNamingAndTriggeringPolicy>
<maxHistory>60</maxHistory>
</rollingPolicy>
<append>true</append>
<encoder>
<pattern>%d{"yyyy-MM-dd HH:mm:ss,SSS z", UTC} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
And here's my little driver class (the 'BCLog' is a simple wrapper around slf4j logging, instantiated by
Logger log = LoggerFactory.getLogger(clazz);
)
package com.sirsidynix.logtest.biz.svc.impl;
import com.sirsidynix.bccommon.util.BCLog;
import com.sirsidynix.bccommon.util.BCLogFactory;
import org.springframework.beans.factory.DisposableBean;
import org.springframework.beans.factory.InitializingBean;
public class JustLogIt implements InitializingBean, DisposableBean
{
private static final BCLog LOG = BCLogFactory.getLog(JustLogIt.class);
private Thread thread;
#Override
public void afterPropertiesSet() throws Exception
{
LOG.info("Firing up JustLogIt thread");
thread = new Thread(){
#Override
public void run()
{
long iteration = 0;
while (true)
{
try
{
Thread.sleep(1000);
iteration++;
LOG.info("Logging Iteration " + iteration);
}
catch (InterruptedException e)
{
LOG.info("LogIt thread sleep interrupted!!!");
}
}
}
};
thread.start();
}
#Override
public void destroy() throws Exception
{
LOG.info("Shutting down JustLogIt thread");
thread.interrupt();
}
}
Any ideas?
Thanks!