ingesting nifi to hdfs to a single directory - hdfs

Scenario
CSV data named test_csv.csv from windows.
Ingesting CSV data to hdfs. Beats > (ListenBeats) NiFi (PutHDFS) > HDFS
data sample:
a,b,c,d,e
a1,b1,c1,d1,e1
a2,b2,c2,d2,e2
a3,b3,c3,d3,e3
a4,b4,c4,d4,e4
a5,b5,c5,d5,e5
a6,b6,c6,d6,e6
a7,b7,c7,d7,e7
a8,b8,c8,d8,e8
according to Nifi Flow UI it works fine and successfully written into hdfs. Problem is
hadoop#ambari:~$ hdfs dfs -ls /user/nifi/test
Found 9 items
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:30 /user/nifi/test/0192a8bb-67ec-462e-a602-62a5425afc99
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:30 /user/nifi/test/0211ec05-fc62-4b82-87e5-a2e20a9fb07e
-rw-r--r-- 3 nifi hdfs 481 2020-07-06 14:30 /user/nifi/test/1e227df9-f49f-46d6-a309-25e466fa14cf
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:30 /user/nifi/test/324a0c0e-e190-4239-b594-edbf9fcab0d6
-rw-r--r-- 3 nifi hdfs 474 2020-07-06 14:30 /user/nifi/test/3d34827b-6bae-4c21-981e-9722b7a6703e
-rw-r--r-- 3 nifi hdfs 481 2020-07-06 14:30 /user/nifi/test/6873c51b-a93b-4872-b33c-0e59b85afcd5
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:30 /user/nifi/test/98606d6b-2206-4b2e-8204-8363a87f41d0
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:30 /user/nifi/test/f25e56b5-88d7-4135-b475-213e4e54b47f
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:30 /user/nifi/test/f354f587-8da2-418f-be0d-34e8a79d7d39
i've tried to change PutHDFS directory into /user/nifi/test.csv
it returns
hadoop#ambari:~$ hdfs dfs -cat /user/nifi/test.csv
cat: `/user/nifi/test.csv': Is a directory
hadoop#ambari:~$ hdfs dfs -ls /user/nifi/test.csv
Found 9 items
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:35 /user/nifi/test.csv/02cdc89d-3cb9-494a-b7f5-d280d7b7c65e
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:35 /user/nifi/test.csv/2476906a-00d9-463a-89ef-ea885f823faa
-rw-r--r-- 3 nifi hdfs 474 2020-07-06 14:35 /user/nifi/test.csv/5b9a9d7e-0c2f-428c-8af4-e875c6db1a04
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:35 /user/nifi/test.csv/66017da5-b55f-437b-a3cf-0a6b45d86ce8
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:35 /user/nifi/test.csv/7be93660-75a1-416b-b019-656d466813d6
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:35 /user/nifi/test.csv/98877296-126c-4ac9-9da5-cef62937e9f9
-rw-r--r-- 3 nifi hdfs 481 2020-07-06 14:35 /user/nifi/test.csv/ac075d33-1137-4aea-9e5b-fc11097558eb
-rw-r--r-- 3 nifi hdfs 480 2020-07-06 14:35 /user/nifi/test.csv/b9b44c08-1bc6-4e33-947b-daf265491181
-rw-r--r-- 3 nifi hdfs 481 2020-07-06 14:35 /user/nifi/test.csv/ba6464db-ef64-4993-a070-80f1392eac1e
is it possible to make nifi write to hdfs in a single directory file?
i was expecing that it will create test.csv file in hdfs
Thank you

Every flow file in NiFi has an attribute named "filename" and that is what PutHDFS is using as the filename in HDFS. The "Directory" property in PutHDFS is only for the directory, so you want to put only "/user/nifi".
In order to change the filename, you would put an UpdateAttribute processor right before PutHDFS, and set filename = whatever-you-want.csv
If you set it to a static value then every time it writes there is going to be an existing file and be in conflict, either replace or throw an error. So you probably want to use a MergeContent/MergeRecord processor first to batch together many small CSV entries into a larger flow file, and then create a dynamic filename like:
filename = test-${now()}.csv
You can use a different expression, but just something unique like a timestamp, date string, or UUID.

Related

Running SageMaker ProcessingJob as non root user

Files copied by SageMaker from s3 to docker inside ProcessingJob have root ownership and permissions which do not allow non owner users (non root) to write to them.
I'd like to run the docker container as non root user and be able to write to folders created by SageMaker, so the Dockerfile looks like this:
FROM base
...
USER nonroot
Exemplary permissions and ownership of folders copied from s3 to SageMaker's docker container:
2022-11-30T10:20:13.567+01:00 + ls -la /opt/ml/processing
2022-11-30T10:20:13.567+01:00 total 24
2022-11-30T10:20:13.567+01:00 drwxr-xr-x 6 root root 4096 Nov 30 09:20 .
2022-11-30T10:20:13.567+01:00 drwxr-xr-x 5 root root 4096 Nov 30 09:20 ..
2022-11-30T10:20:13.567+01:00 drwxr-xr-x 2 root root 4096 Nov 30 09:20 data
2022-11-30T10:20:13.568+01:00 drwxr-xr-x 2 root root 4096 Nov 30 09:20 output
I'd expect these folders to either has nonroot user ownership.
I've checked the documentation but no luck there. If there's any obvious way to achieve this that I missed, please let me know. Thanks!

Synbolik link to aws EFS on web server doesn't work?

I have static folder serving the files on uwsgi.
/user/app/static/
lrwxrwxrwx 1 root root 23 Oct 13 09:40 _out -> /usr/src/app/_mat/_out/
drwxr-xr-x 8 root root 4096 Oct 13 09:49 assets
drwxr-xr-x 8 root root 4096 Oct 13 09:40 pages
in this case, the imagefiles under assets can be appeared correctly,
however the image files under _out can not be accessed.(404 error occurs)
static/assets/test.png is ok
static/_out/test.png returns 404 error
/usr/src/app/_mat/ are on the aws EFS.
I checked the permissions.
Generally speaking, does symbolic link work under web server?

error: error creating output file /var/lib/logrotate.status.tmp: Permission denied

I am trying to logrotate my log files. Here is my configuration file:
/home/deploy/apps/production_app/current/log/*.log {
daily
missingok
rotate 52
compress
create 0644 deploy deploy
delaycompress
notifempty
sharedscripts
copytruncate
}
And this is result of
ll apps/production_app/current/log/
on my log files:
-rw-rw-r-- 1 deploy deploy 0 Jul 1 10:01 production.log
-rw-rw-r-- 1 deploy deploy 1124555 Jul 1 10:01 production.log.1
And when I run this command
logrotate -v /etc/logrotate.d/production_app
I get following:
error: error creating output file /var/lib/logrotate.status.tmp:
Permission denied
And here is permission on my log-rotate config file
lrwxrwxrwx 1 root root 67 Feb 25 2019 /etc/logrotate.d/production_app -> /home/deploy/apps/production_app/shared/config/log_rotation
please check whether the dir "var/lib" is readonly.

Tomcat's logs is empty after rotating - AWS EC2 - Tomcat 8

Every Tomcat's logs is empty. When I access EC2 with console on the /var/log/tomcat8 dir, every log file is empty with 0kb.
I already tried to change the logrotate.elasticbeanstalk.tomcat8.conf, but without success.
This is the entire logrotate.elasticbeanstalk.tomcat8.conf file as I already tried to change.
/var/log/tomcat8/* {
size 10M
rotate 5
missingok
compress
notifempty
dateext
dateformat %s
olddir /var/log/tomcat8/rotated
}
This is how Tomcat's log file look:
-rw-r--r-- 1 tomcat tomcat 0 Jul 18 00:01 catalina.2019-07-17.log
-rw-r--r-- 1 tomcat tomcat 0 Jul 18 17:01 catalina.2019-07-18.log
-rw-r--r-- 1 tomcat tomcat 0 Jul 19 19:01 catalina.2019-07-19.log
Look in the rotated subdirectory for .gz files with the same filename as a prefix. E.g., catalina.2019-07-17*.gz

Google PageSpeed Apache Error log CentOS

I installed the Google PageSpeed module to my CentOS 7.0 DA VPS.
I used this blog, by installing the PageSpeed module: http://www.haloseeker.com/install-go...h-directadmin/
When I check my Apache Error Log, I found the following errors:
[pagespeed:error] [pid 2593] [mod_pagespeed 1.11.33.1-0 #2593] Could not create directories for file /var/cache/mod_pagespeed/v3/domain.com/https,3A/,2Fwww.domain.com/icon_feed.gif,.temp
[pagespeed:error] [pid 2593] [mod_pagespeed 1.11.33.1-0 #2593] /var/cache/mod_pagespeed/v3/domain.com/https,3A/,2Fwww.domain.com/icon_feed.gif,.temp8f2OKe:0: opening temp file: No such file or directory
[pagespeed:error] [pid 2673] [mod_pagespeed 1.11.33.1-0 #2673] Failed to make directory /var/cache/mod_pagespeed/v3/domain.com/https,3A/,www.domain.com/images: Permission denied
How can I solve this problem?
Try to update your directory permission with the following command and let me know if you have any issues.
chmod -R a+w /var/cache/mod_pagespeed
This doesn't have to work. Depending on the Apache configuration (mpm_itk), it's possible that each vhost is served as another user.
So important directories are made under user A, and when another request to host B, user B cannot delete/create subdirectories.
I haven't figured out how to solve this, running CentOS and cPanel as hoster.
-rw-------. 1 tvr86nl tvr86nl 13 Aug 17 23:14 !clean!time!
drwxr-xr-x. 4 tvr86nl tvr86nl 37 Aug 17 00:45 prop_page
drwxr-xr-x. 12 tvr86nl tvr86nl 4.0K Aug 17 12:54 rname
drwxr-xr-x. 3 tvr86nl tvr86nl 22 Aug 16 18:16 v3
root#vps1.sse-ict.nl /var/mod_pagespeed/cache>
so this happens every time when these directories are created :(
perhaps a cron-job would do the trick. But it's a mod_pagespeed
shortcoming!