Fetching a particular file from s3 bucket

Fetching a particular file from s3 bucket - amazon-web-services

Whenever I run my code, I upload the result to my s3 bucket, which doesn't have a specific name pattern.
Suppose the following files are present in the bucket right now.
2017-12-06 11:40:47 93185 RAW_D_3600_S_1_1294573351559346926-0.metadata
2017-12-06 11:40:47 167170 RAW_D_3600_S_1_1294573351559346926-0.txt
2017-12-06 10:55:54 21 USERENROLL_1_1-0.metadata
2017-12-06 10:55:54 190 USERENROLL_1_1-0.txt
2017-12-05 17:56:36 174 USERENROLL_1_1-duke1.csv
2017-12-06 11:13:45 105 USERENROLL_1_7-0.metadata
2017-12-06 11:13:45 599 USERENROLL_1_7-0.txt
2017-12-06 11:15:51 126 USERENROLL_1_8-0.metadata
2017-12-06 11:15:51 600 USERENROLL_1_8-0.txt
2017-12-06 10:59:26 21 USERENROLL_1_9-0.metadata
2017-12-06 10:59:26 181 USERENROLL_1_9-0.txt
I want to download the file which was created on 2017-12-06 11:40:47.
I fetch the file by storing the current time and search for files in the bucket that are generated after the stored time. But this doesn't give accurate results all the time, as file might take longer to be generated.
Is there a way to tackle this problem?
PS - Can't change the naming convention being used.

Related

apex_data_parser file > 50MB

I have a page to upload a file using apex_data_parser when the user click the button.
The SQL bellow using apex_data_parser to upload a file with 550K rows and 52 MB take about 180 min to finish.
The same file, same page, but I delete some rows and keep 500K rows and 48 MB, take about 8 min to finish.
I had tested with the same file 750K rows and 40 MB (I delete the column with some description to stay bellow 50 MB) took 15 min
Somebody have any ideia why the time is so different with the file above 50 MB?
Reading the API the limit is 2GB and 4GB to the file
The SQL I took from the sample upload in app App Gallery
FOR r_temp in (SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008
from apex_application_temp_files f,
table( apex_data_parser.parse(
p_content => f.blob_content,
p_file_type => 2,
p_skip_rows => 1,
p_file_name => f.filename ) ) p
where f.name = p_name)
LOOP
...
It's a .csv file
Column Position Column Name Data Type Format Mask
1 LINE NUMBER -
2 ACCOUNT NUMBER -
3 DATETIME_CALL DATE YYYY"-"MM"-"DD" "HH24":"MI":"SS
4 TYPE_CALL VARCHAR2(255) -
5 CALL NUMBER -
6 DURATION NUMBER -
7 UNIT VARCHAR2(50) -
8 PRICE NUMBER -
What I did next?
To simplifie the problem I changed the sql statement to a simple count(*).
I have create a demo account at oracle cloud and started a Autonomous Transaction Processing using the same file, same appplication to test. The results: File greater than 50MB 6 hours to execute a SQL count statement (see attachament bellow). File with 48MB 3 minutes to execute the same SQL count statement. Maybe a apex.parser limit?
This chart below is interesting, the User I/O goes up a lot, only with > 50 MB in my tests.
I took the 50 MB file that has processed OK in 3 minutes and copy some rows to increase until 70 MB (so the file is not corrupt)

I believe the answer can be found in this question: Oracle APEX apex_data_parser execution time explodes for big files

I don't think it's a function of the file size, it's more a function of the shape and width of the data.
180 minutes is a super long time. If you're able to reproduce this, can you examine the active database session and determine the active SQL statement and any associated wait event?
Also - what file format is this, and what database version are you using?

CFLint tool coldfusion(CFLint VS Fortify)

I have scanned coldfusion code using the cflint jar CFLint-1.3.0-1ll.jar from the command line as
java -jar <jar path> -folder <mylocalColdfusioncodeFolder>
It gives cflint-result.html file in the corresponding folder.
In the report, I found that no cross site scripting and DOM related issues as mentioned by Fortify Audit Workbench tool. CFLint is basically gives language specific issues because it's mainly run on CFParser.
When I run the below command to know the rules against scan I found all are language specific rules.
java -jar CFLint-1.3.0-all.jar -rules gives a list of rules as
The Supported rules to check against the cfm code :
-----------------------------------------------------
1 ComplexBooleanExpressionChecker
2 GlobalLiteralChecker
3 CFBuiltInFunctionChecker
4 CreateObjectChecker
5 CFDumpChecker
6 FunctionTypeChecker
7 ArrayNewChecker
8 LocalLiteralChecker
9 SelectStarChecker
10 TooManyFunctionsChecker
11 QueryParamChecker
12 FunctionLengthChecker
13 OutputParmMissing
14 WriteDumpChecker
15 CFExecuteChecker
16 ComponentLengthChecker
17 GlobalVarChecker
18 CFModuleChecker
19 CFIncludeChecker
20 CFDebugAttributeChecker
21 ComponentDisplayNameChecker
22 ArgVarChecker
23 NestedCFOutput
24 VarScoper
25 FunctionHintChecker
26 ArgumentNameChecker
27 TooManyArgumentsChecker
28 SimpleComplexityChecker
29 TypedQueryNew
30 CFInsertChecker
31 StructKeyChecker
32 BooleanExpressionChecker
33 VariableNameChecker
34 MethodNameChecker
35 AbortChecker
36 ComponentNameChecker
37 UnusedArgumentChecker
38 StructNewChecker
39 PackageCaseChecker
40 CFAbortChecker
41 ComponentHintChecker
42 ArgumentTypeChecker
43 CFUpdateChecker
44 IsDebugModeChecker
45 ArgDefChecker
46 UnusedLocalVarChecker
47 CFSwitchDefaultChecker
48 ArgumentHintChecker
49 CFCompareVsAssignChecker
-----------------------------------------------------
And I found that CFLint does not raises errors of CSS attacks. When I run the same coldfusioncodefolder with Fortify tool (Audit workbench), I got CSS issues like
Cross-Site Scripting: Reflected
, Cross-Site Scripting: DOM
, Unreleased resource
, Dynamic Code Evaluation: Code Injection
, Hardcoded Password
, Sql Injection
, Path Manipulation
, log forging and privacy violation with the tags cfdocument
, cfdirectoryexists
, cfcookie
, cflog
, cffile.....
Can you please clarify whether CFLint scans CSS issues or it only checks the rules only specific to ColdFusion language?

CFLint is only concerned about ColdFusion code. It is not a security scanner nor a CSS linter. You are mixing up your tools and their purposes.
A linter scans for code issues - not security issues.

Windows Get list of ALL files on volume with size

Question: how to list all files on volume with size they occupy on disk?
Applicable solutions:
cmd script
free tool with sqlite/txt/xls/xml/json output
C++ / winapi code
The problem:
There are many tools and apis to list files, but their results dont match chkdsk and actual free space info:
Size Count (x1000)
chkdsk c: 67 GB 297
dir /S 42 GB 267
FS Inspect 47 GB 251
Total Commander (Ctrl+L) 47 GB 251
explorer (selection size) 44 GB 268
explorer (volume info) 67 GB -
WinDirStat 45 GB 245
TreeSize couldn't download it - site unavailable
C++ FindFirstFile/FindNextFile 50 GB 288
C++ GetFileInformationByHandleEx 50 GB 288
Total volume size is 70 GB, about 3 GB is actually free.
I'm aware of:
File can occupy on disk, more than its actual size, i need the size it occupies (i.e. greater one)
Symlinks, Junctions etc - that would be good to see them (though i don't think this alone can really give 20 GB difference in my case)
Filesystem uses some space for indexes and system info (chkdisk shows negligible, don't give 20 GB)
I run all tools with admin privileges, hidden files are shown.
FindFirstFile/FindNextFile C++ solution - this dont give correct results, i don't know because of what, but this gives the same as Total commander NOT the same as chkdsk
Practical problem:
I have 70 GB SSD disk, all the tools report about 50 GB is occupied, but in fact it's almost full.
Format all and reinstall - is not an option since this will happens again quite soon.
I need a report about filesizes. Report total must match actual used and free space. I'm looking for an existing solution - a tool, a script or a C++ library or C++ code.
(Actual output below)
chkdsk c:
Windows has scanned the file system and found no problems.
No further action is required.
73715708 KB total disk space.
70274580 KB in 297259 files.
167232 KB in 40207 indexes.
0 KB in bad sectors.
463348 KB in use by the system.
65536 KB occupied by the log file.
2810548 KB available on disk.
4096 bytes in each allocation unit.
18428927 total allocation units on disk.
702637 allocation units available on disk.
dir /S
Total Files Listed:
269966 File(s) 45 071 190 706 bytes
143202 Dir(s) 3 202 871 296 bytes free
FS Inspect http://sourceforge.net/projects/fs-inspect/
47.4 GB 250916 Files
Total Commander
49709355k, 48544M 250915 Files

On a Posix system, the answer would be to use the stat function. Unfortunately, it does not give the number of allocated blocs in Windows so it does not meet your requirements.
The correct function from Windows API is GetFileInformationByHandleEx. You can use FindFirstFile, FindNextFile to browse the full disk, and ask for FileStandardInfo to get a FILE_STANDARD_INFO that contains for a file (among other fields): LARGE_INTEGER AllocationSize for the allocated size and LARGE_INTEGER EndOfFile for the used size.
Alternatively, you can use directly GetFileInformationByHandleEx on directories, asking for FileIdBothDirectoryInfo to get a FILE_ID_BOTH_DIR_INFO structure. This allows you to get information on many files in a single call. My advice would be to use that one, even if it is of less common usage.

To get list of all files (including hidden and system files), sorted within directories with descending size, you can go to your cmd.exe and type:
dir /s/a:-d/o:-s C:\* > "list_of_files.txt"
Where:
/s lists files within the specified directory and all subdirectories,
/a:-d lists only files (no directories),
/o:-s put files within directory in descending size order,
C:\* means all directories on disk C,
> "list_of_files.txt" means save output to list_of_files.txt file
Listing files grouped by directory may be a little inconvenient, but it's the easiest way to list all files. For more information, take a look at technet.microsoft.com
Checked on Win7 Pro.

Using IWA with SAS ExportPackage utility

Is it possible to avoid the use of passwords when using the SAS Metadata Batch Export Tool?
I am building a feature in my STP web app (SAS 9.2, IWA, Kerberos) for auto-exporting metadata items. As per the documentation, the ExportPackage utility requires credentials either directly (-user and -password etc options) or via a connection profile (-profile).
Logging onto the application server as sassrv, the contents of my connection profile are as follows:
#Properties file updated on: Thu Mar 12 16:35:07 GMT 2015 !!!!! DO NOT EDIT !!!!!!!
#Thu Mar 12 16:35:07 GMT 2015
Name=SAS
port=8561
InternalAccount=false
host=DEV-SASMETA.somecompany.int
AppServer.Default=A5MNZZZZ.AR000666
AllowLocalPasswords=true
authenticationdomain=DefaultAuth
SingleSignOn=true
Running my code however results in the following:
44 +%put Batch tool located at: &platform_object_path;
Batch tool located at: C:\Program Files\SAS/SASPlatformObjectFramework/9.2
45 +filename inpipe pipe
46 + " ""&platform_object_path\ExportPackage"" -profile MyProfile
47 + -package 'C:\Temp\TestPackage.spk' -objects '/SomeFolder/ARCHIVE(Folder)' -includeDep -subprop";
48 +data _null_;
49 + infile inpipe;
50 + input; putlog _infile_;
51 +run;
NOTE: The infile INPIPE is:
Unnamed Pipe Access Device,
PROCESS="C:\Program Files\SAS/SASPlatformObjectFramework/9.2\ExportPackage" -profile MyProfile -package 'C:\Temp\TestPackage.spk' -objects '/SomeFolder/ARCHIVE(Folder)' -includeDep
-subprop,
RECFM=V,LRECL=256
The export process has failed. The native implementation module for the security package could not be found in the path.
For more information, view the export log file: C:\Users\sassrv\AppData\Roaming\SAS\Logs\Export_150427172003.log
NOTE: 2 records were read from the infile INPIPE.
The minimum record length was 112.
The maximum record length was 121.
The log file was empty.
Presumably my options here are limited to:
Requesting the user password from the front end
Using a system account in the connection profile
Using a system account in the -user & -password options
??

I verified this some time ago with SAS Technical Support; it's currently not possible.
I entered a SASWare Ballet item for it: https://communities.sas.com/t5/SASware-Ballot-Ideas/Add-Integrated-Windows-Authentication-IWA-support-to-Batch/idi-p/220474
please vote!

This was initially resolved by embedding a username / password in the profile, but now it works by taking the (modified) template profile below, and adding the host= / port= parameters dynamically at runtime (so can use the same profile in different environments).
IWA is now used to connect to the metadata server!
# This file is used by Release Management for connecting to the metadata server
# The host= and port= parameters are provided by the application at runtime
SingleSignOn=true
AllowLocalPasswords=true
InternalAccount=false
SecurityPackageList=Negotiate,NTLM
SecurityPackage=Negotiate
An important thing I discovered (not in SAS documentation) is that you can substitute a profile name with an absolute path to a profile (.swa file) in the ExportPackage commmand.
Edit (one year later):
As pointed out by #Stig Eide, the error does seem to relate to 32 vs 64 bit JREs. I also came across this issue in DI Studio today and solved it by copying the sspiauth.dll files as described here

"too many open files" error after deleting many files

My program creates a log file every 10 seconds in a specified directory. Then in a different thread, it iterates the files in that directory. If the file has content it compresses it and uploads it to external storage, if the file is empty, it deletes it. After the program runs for a while I get an error "too many open files" (gzopen failed, errno = 24).
When I looked inside /proc/<pid>/fd I see many broken links to files in the same directory where the logs are created and the word (deleted) next to the link.
Any idea what am I doing wrong? I checked the return values in both threads, of the close function (in the thread which writes the logs), and in the boost::filesystem::remove (the thread which compresses and uploads the non empty log files and deletes empty log files). All the return values are zero while the list of the (deleted) links gets longer buy 1 every 10 seconds.
I think this problem never happened to me on 32 bits but recently I moved to 64 bits and now I got this surprise.

You are neglecting to close files you open.
From your description, it sounds like you close the files you open for logging in your logging thread, but you go on to say that you just boost::filesystem::remove files after compressing and/or uploading.
Remember that:
Any compressed file you opened with gzopen has to be gzclosed
Any uncompressed file you open to compress it has to be closed.
If you open a file to check if it's empty, you have to close it.
If you open a file to transfer it, you have to close it.
Output of /proc/pid/fd would be very helpful in narrowing this down, but unfortunately you don't post it. Examples of how seemingly unhelpful output gives subtle hints:
# You forgot to gzclose the output file after compressing it
l-wx------ user group 64 Apr 9 10:17 43 -> /tmp/file.gz (deleted)
# You forgot to close the input file after compressing it
lr-x------ user group 64 Apr 9 10:17 43 -> /tmp/file (deleted)
# You forgot to close the input file after logging
l-wx------ user group 64 Apr 9 10:17 43 -> /tmp/file (deleted)
# You forgot to close the input file after transferring it
lr-x------ user group 64 Apr 9 10:17 43 -> /tmp/file.gz (deleted)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Fetching a particular file from s3 bucket - amazon-web-services

Related

apex_data_parser file > 50MB

CFLint tool coldfusion(CFLint VS Fortify)

Windows Get list of ALL files on volume with size

Using IWA with SAS ExportPackage utility

"too many open files" error after deleting many files

Categories

Resources