How to specify the directory where to save the best checkpoint directories?

How to specify the directory where to save the best checkpoint directories? - ray

ray.air.session.report(https://docs.ray.io/en/latest/tune/api_docs/trainable.html?highlight=air.CheckpointConfig#class-api-checkpointing) mentions that it moves checkpoints to a new path. Is there a way to specify what path is (default is /tmp/)?

Related

How to check whether a folder exists or not in gcp cloud storage out of 1 million folders

For example I have structure like this.
bucketname/checked/folder1/some files
bucketname/checked/folder2/some files
bucketname/checked/folder3/some files
bucketname/checked/folder4/some files
bucketname/checked/folder5/some files
bucketname/checked/folder6/some files
bucketname/checked/folder7/some files
bucketname/checked/folder8/some files
bucketname/checked/folder9/some files
bucketname/checked/folder10/some files
bucketname/checked/folder11/some files
......
......
bucketname/checked/folder-1million/some files
Now,
1. If I have to check whether folder99999 exists or not. So,what would be the best way to check it (we have information of folder name - folder99999) ?
2. If we simply check path that exists or not, and if not then it means, folder don't exists. would it work fine If we have millions of folders?
3. Which data structure gcp uses to retrieve the folder data ?

The true answer is this one provided by John: folder doesn't exist. All the files are stored at the root directory (bucket level) and the file name is the full path. By human convention, the / is the folder separator and the console display fake folders.
If you haven't files in a "folder", the "folder" doesn't exist, it's not interpreted/deduced from the name fully qualified path. The folder is not a Cloud Storage resource
It's also for that reason that you search only by path prefix
However, it depends what you want to check. If you exactly know which folder you want to check and validate, and if there is at least one file in it, you can directly list the files with the folder path as prefix.

file path of '/tmp' in Django Session

Right now i am using file based session in django to save data.
SESSION_ENGINE = "django.contrib.sessions.backends.file"
As per documentation django saves data in /tmp, but i dont understand what is actual path of this /tmp! Is this a directory in my project or else where?

I think you misread the documentation, the documentation [Django-doc] says:
You might also want to set the SESSION_FILE_PATH setting (which defaults to output from tempfile.gettempdir(), most likely /tmp) to control where Django stores session files. Be sure to check that your Web server has permissions to read and write to this location.
If we check the documentation on the tempfile.gettempdir() [Python-doc] we get:
Return the name of the directory used for temporary files. This
defines the default value for the dir argument to all functions in
this module.
Python searches a standard list of directories to find one which the
calling user can create files in. The list is:
The directory named by the TMPDIR environment variable.
The directory named by the TEMP environment variable.
The directory named by the TMP environment variable.
A platform-specific location:
On Windows, the directories C:\TEMP, C:\TMP, \TEMP, and \TMP, in that order.
On all other platforms, the directories /tmp, /var/tmp, and /usr/tmp, in that order.
As a last resort, the current working directory.
The result of this search is cached, see the description of tempdir
below.
So althought on Unix-based systems (Linux, BSD, Mac OS X, etc.) it will be stored in /tmp, it depends on the operating system.
As for Unix-based file systems, if there is a slash (/) in the front, that means an absolute path, so it is the tmp directory in the root of the filesystem. For more information on Unix file paths, see this article [geeksforgeeks].
If you however set the SESSION_FILE_PATH to a specific path, then that path will be used.
Note that temporary files are, well, temporary. Typically you should not assume that after a reboot, the files are still there.

Flume - spoolign dir source - ingesting sub directories

I am currently using Flume 1.7 . Configured a spooling directory source. I have enabled recursiveDirectorySearch=true to look in to the sub directories for files.
source.spoolDir=/tmp/test
and under /tmp/test, subdirectories get created with data files /tmp/test/data1/file.csv , /tmp/test/data2/file2.csv .
I want the exact sub directory structure to be created in the HDFS sink path.
/sink/data1/file.csv
/sink/data2/file2.csv
When i use the %{file} for HDFS sink filepath, i get the complete absolute path, and %{basename} gives me only the file name. I want to extract the sub directory structure from the spooldir source path. Any way to achieve this?

You can make use of the fileHeader and fileHeaderKey properties and refer to this header variable at your sink configuration to get the absolute path.
https://flume.apache.org/FlumeUserGuide.html#spooling-directory-source

Informatica B2B Data Exchange

I am new to B2B DX, using since 2 months. I have a requirement where files are getting generated in dynamic folder. For eg. File name is 20170503test.txt then it will get generated in /2017_05/20170503/20170503test.txt.
The next day means tomorrow it will get generated in /2017_05/20170504/20170504test.txt. So how my endpoint can pick these files as they are getting generated in different folders? So what I can set file pattern is *test.txt. But how endpoint can go in different directories?

If you go high enough in the target file system, there will be a single directory which is common among all the target files... even if it is the root directory... set your target folder to that. Then in the mapping itself, create a special FileName port on your target... this will dynamically set the name of the output file to whatever value this port is set to so you can fully qualify the string you set this to to include the full dynamic file path up until that shared directory described above.

Failed to create log file on application directory?

I want to write a log file for my application. The path where I want to store the file is:
destination::"C:\ColdFusion8\wwwroot\autosyn\logs"
I have used the sample below to generate the log file:
<cfset destination = expandPath('logs')>
<cfoutput>destination::"#destination#"</cfoutput><br/>
<cflog file='#destination#/test' application="yes" text="Running test log.">
When I supply the full path, it didn't create a log file. When I remove my destination, and only provide a file name, the log is generated in the ColdFusion server path C:\ColdFusion8\logs.
How can I generate a log file in my application directory?

Here is the description of attribute file according to cflog tag specs:
Message file. Specify only the main part of the filename. For example,
to log to the Testing.log file, specify "Testing".
The file must be located in the default log directory. You cannot
specify a directory path. If the file does not exist, it is created
automatically, with the extension .log.
You can use cffile tag to write information into the custom folder.

From the docs for <cflog>:
file
Optional
Message file. Specify only the main part of the filename. For example, to log to the Testing.log file, specify "Testing".
The file must be located in the default log directory. You cannot specify a directory path. If the file does not exist, it is created automatically, with the extension .log.
(My emphasis).
Reading the docs is always a good place to start when wondering how things might work.
So <cflog> will only log to the ColdFusion logs directory, and that is by design.
I don't have CF8 handy, but you would be able to set the logging directory to be a different one via either the CFAdmin UI (CF9 has this, I just confirmed), or neo-logging.xml in WEB-INF/cfusion/lib.
Or you could use a different logging mechanism. I doubt it will work on a rusty of CF8 install, but perhaps LogBox?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js