Delete S3 bucket versions with delete marker - amazon-web-services

I am trying to delete an s3 bucket with versions and delete markers.
I cannot delete them from the console and I am stuck on this for a while.
I have also tried running a few python scripts but still nothing changes.

In the Amazon S3 management console, you will see: Versions: Hide / Show
Clicking Show will display all versions of an object, including the Delete Marker. You can then select versions and delete markers and delete them.

The following Python code should do what you want:
bucket = "my_s3_bucket_1234aefa"
file_to_delete = "file_i_want_to_delete.png"
results = []
response = s3_client.list_object_versions(
Bucket=bucket,
Prefix=file_to_delete,
)
for k in ['Versions', 'DeleteMarkers']:
if k in response:
k_response = response[k]
to_delete = [r['VersionId'] for r in k_response if r['Key'] == file_to_delete]
results.extend(to_delete)
for version in results:
s3_client.delete_object(Bucket=bucket, Key=file_to_delete, VersionId=version)

Related

How to delete Feature Group from SageMaker Feature Store, by name

The way to delete a feature group using the SageMaker Python SDK is as follows:
my_feature_group.delete()
But this only deletes the feature group you are currently working on. How can one delete feature groups from prior sessions? I tried deleting them out of the S3 bucket directly, but they still appear in the Feature Store UI.
It would be great if feature groups could be deleted through the UI. But if not, is there a way to delete a feature group using it's full name; the one that was created using:
my-feature-group-" + strftime("%d-%H-%M-%S", gmtime())
You can create a FeatureGroup object and call delete or via cli or SageMakerFeatureStoreRuntime client
source: aws
You can loop over list_feature_groups as follows:
def extract_feature_groups(feature_groups):
list_feature_groups = []
list_feature_groups.extend([x['FeatureGroupName'] for x in feature_groups['FeatureGroupSummaries']])
next_token = '' if not ('NextToken' in feature_groups.keys()) else feature_groups['NextToken']
while not (next_token==''):
page_feature_groups = boto_client.list_feature_groups(NextToken=next_token)
list_feature_groups.extend([x['FeatureGroupName'] for x in page_feature_groups['FeatureGroupSummaries']])
next_token = '' if not ('NextToken' in page_feature_groups.keys()) else page_feature_groups['NextToken']
return list_feature_groups
region_name = <your_region_name>
boto_client = boto3.client('sagemaker', region_name=region_name)
boto_session = boto3.session.Session(region_name=region_name)
fs_sagemaker_session = sagemaker.Session(boto_session=boto_session)
feature_groups = boto_client.list_feature_groups()
list_features_groups = extract_feature_groups(feature_groups)
for fg in list_features_groups:
<make sure to include appropriate name filter and/or confirmation requests>
feature_group = FeatureGroup(name = feature, sagemaker_session = fs_sagemaker_session)
feature_group.delete()
Feature groups take time to complete deletion; you might want to add a function for checking deletion has concluded successfully.

Amazon S3 - ColdFusion's fileExists breaks when file was deleted by s3cmd

I'm running a site on ColdFusion 9 that stores cached information on Amazon S3.
The ColdFusion app builds the files and puts them into Amazon S3. Every N hours, the cache gets flushed with a bash script that executes s3cmd del, because it's much more efficient than ColdFusion's fileDelete or directoryDelete.
However, after the file has been deleted by s3cmd, ColdFusion will still flag it as an existing file, even though it won't be able to read its contents.
For the ColdFusion app, I provide the S3 credentials on Application.cfc, and they are the same authentication keys used by s3cmd, so I don't think it's a user permission issue.
Let's run through the process:
// Create an S3 directory with 3 files
fileWrite( myBucket & 'rabbits/bugs-bunny.txt', 'Hi there, I am Bugs Bunny' );
fileWrite( myBucket & 'rabbits/peter-rabbit.txt', 'Hi there, I am Peter Rabbit' );
fileWrite( myBucket & 'rabbits/roger-rabbit.txt', 'Hi there, I am Roger Rabbit' );
writeDump( var = directoryList(myBucket & 'rabbits/', 'true', 'name' ), label = 'Contents of the rabbits/ folder on S3' );
// Delete one of the files with ColdFusion's fileDelete
fileDelete( myBucket & 'rabbits/roger-rabbit.txt' );
writeDump( var = directoryList(myBucket & 'rabbits/', 'true', 'name' ), label = 'Contents of the rabbits/ folder on S3' );
// Now, let's delete a file using the command line:
[~]$ s3cmd del s3://myBucket/rabbits/peter-rabbit.txt
File s3://myBucket/rabbits/peter-rabbit.txt deleted
writeDump( var = directoryList(myBucket & 'rabbits/', 'true', 'name' ), label = 'Contents of the rabbits/ folder on S3' );
// So far, so good!
// BUT!... ColdFusion still thinks that peter-rabbit.txt exists, even
// though it cannot display its contents
writeOutput( 'Does bugs-bunny.txt exist?: ' & fileExists(myBucket & 'rabbits/bugs-bunny.txt') );
writeOutput( 'Then show me the content of bugs-bunny.txt: ' & fileRead(myBucket & 'rabbits/bugs-bunny.txt') );
writeOutput( 'Does peter-rabbit.txt exist?: ' & fileExists(myBucket & 'rabbits/peter-rabbit.txt') );
writeOutput( 'Then show me the content of peter-rabbit.txt: ' & fileRead(myBucket & 'rabbits/peter-rabbit.txt') );
// Error on fileRead(peter-rabbit.txt) !!!
I agree with the comment by #MarkAKruger that the problem here is latency.
Given that ColdFusion can't consistently tell whether a file exists, but it DOES consistently read its up-to-date contents (and consistently fails to read them when they are not available), I've come up with this solution:
string function cacheFileRead(
required string cacheFileName
){
var strContent = '';
try{
strContent = FileRead( ARGUMENTS.cachefileName );
}catch(Any e){
strContent = '';
}
return strContent;
}
This answer assumes latency is your problem as I have asserted in the comments above.
I think I would keep track of when s3cmd is run. If you are running it via CFEXECUTE then store a timestamp in the Application scope or a file or DB table. Then, when checking for a file if the command has been run in the last N number of minutes (you'll have to experiment to figure out what makes sense) you would recache automatically. When N minutes have passed you can rely on your system of checks as reliable.
If your are not running s3cmd from cfexecute, try creating a script that updates the timestamp in the application scope and then add a curl command to your s3cmd script that hits your cf script - keeping the 2 processes in synch.
Your other option is to constantly use fileExists() (not a good idea - very expensive) or keep track of what is cached or not cached some other way that can be updated in real time - a DB table for example. You would then need to clear the table from your s3cmd script (perhaps using mysql command line).
I may think of something else for you. That's all I have for now. :)

Is there a way to add multiple jobs using HadoopJarStepConfig jarConfig = new HadoopJarStepConfig(HADOOP_JAR);

I have written a AWS SWF workflow and the first action is to boot cluster and run a mapreduce program. This action also has like 2 other mapreduce jars to be executed depending upon the first jar's output. I am using this to add the jars
HadoopJarStepConfig jarConfig = new HadoopJarStepConfig(S3N_HADOOP_JAR);
jarConfig.setArgs(ARGS_AS_LIST);
HadoopJarStepConfig jarConfig1 = new HadoopJarStepConfig(S3N_HADOOP_JAR);
jarConfig1.setArgs(ARGS_AS_LIST1);
try {
StepConfig enableDebugging = new StepConfig()
.withName("Enable debugging")
.withActionOnFailure("TERMINATE_JOB_FLOW")
.withHadoopJarStep(new StepFactory().newEnableDebuggingStep());
StepConfig runJar = new StepConfig(HADOOP_JAR, jarConfig);
StepConfig runJar1 = new StepConfig(HADOOP_JAR, jarConfig1);
request.setSteps(Arrays.asList(new StepConfig[]{enableDebugging, runJar, runJar1}));
RunJobFlowResult result = emr.runJobFlow(request);
Is this the correct way to add multiple jars. Thanks.
Use:
request.withSteps(enableDebugging, runJar, runJar1);
Don't Use:
new StepConfig[]{enableDebugging, runJar, runJar1})); is wrong, you dont need a SpetConfig again here..

How to access unwanted save file name and delete it?

I need to filter my raster image by a fixed threshold. So I use ILogicalOp functions. Whenever I use them, an output file will be saved on workspace, which is unwanted due to my large database. The saving happens exactly after rasOut[i] = RMath.LessThan(inputRas[i], cons01). How can I prevent this? Or how to get saved file name and delete it? Any comments would be Appreciated?
private IGeoDataset[] CalcColdThreshold(IGeoDataset[] inputRas)
{
IGeoDataset[] rasOut = new IGeoDataset[inputRas.Length];
IGeoDataset emptyRas=null;
ILogicalOp RMath;
RMath = new RasterMathOpsClass();
IRasterAnalysisEnvironment env;
env = (IRasterAnalysisEnvironment)RMath;
IWorkspaceFactory workspaceFactory = new RasterWorkspaceFactoryClass();
IWorkspace workspace = workspaceFactory.OpenFromFile(System.IO.Path.GetFullPath(workSpace_save.Text), 0);
env.OutWorkspace = workspace;
IRasterMakerOp Rmaker = new RasterMakerOpClass();
IGeoDataset cons01;
Threshold_value = 15000;
cons01 = Rmaker.MakeConstant(Threshold_value, false);
for (int i = 0; i < inputRas.Length; i++)
{
rasOut[i] = RMath.LessThan(inputRas[i], cons01);
}
return rasOut;
}
(disclaimer: I'm not actually a C++ programmer, just trying to provide some pointers to get you going since no one else seems to have any answers.) (converted from comment)
The IScratchWorkspaceFactory interface sounds like it will do what you want - instead of creating your workspace variable using IWorkspaceFactory.OpenFromFile, try creating a scratch workspace instead? According to the documentation it will be automatically cleaned up when your application exits.
Just remember to use a different workspace for your final output. :)

Sitecore Clear Cache Programmatically

I am trying to publish programmatically in Sitecore. Publishing works fine. But doing so programmatically doesn't clear the sitecore cache. What is the best way to clear the cache programmatically?
I am trying to use the webservice that comes with the staging module. But I am getting a Bad request exception(Exception: The remote server returned an unexpected response: (400) Bad Request.). I tried to increase the service receivetimeout and sendtimeout on the client side config file but that didn't fix the problem. Any pointers would be greatly appreciated?
I am using the following code:
CacheClearService.StagingWebServiceSoapClient client = new CacheClearService.StagingWebServiceSoapClient();
CacheClearService.StagingCredentials credentials = new CacheClearService.StagingCredentials();
credentials.Username = "sitecore\adminuser";
credentials.Password = "***********";
credentials.isEncrypted = false;
bool s = client.ClearCache(true, dt, credentials);
I am using following code to do publish.
Database master = Sitecore.Configuration.Factory.GetDatabase("master");
Database web = Sitecore.Configuration.Factory.GetDatabase("web");
string userName = "default\adminuser";
Sitecore.Security.Accounts.User user = Sitecore.Security.Accounts.User.FromName(userName, true);
user.RuntimeSettings.IsAdministrator = true;
using (new Sitecore.Security.Accounts.UserSwitcher(user))
{
Sitecore.Publishing.PublishOptions options = new Sitecore.Publishing.PublishOptions(master, web,
Sitecore.Publishing.PublishMode.Full, Sitecore.Data.Managers.LanguageManager.DefaultLanguage, DateTime.Now);
options.RootItem = master.Items["/sitecore/content/"];
options.Deep = true;
options.CompareRevisions = true;
options.RepublishAll = true;
options.FromDate = DateTime.Now.AddMonths(-1);
Sitecore.Publishing.Publisher publisher = new Sitecore.Publishing.Publisher(options);
publisher.Publish();
}
In Sitecore 6, the CacheManager class has a static method that will clear all caches. The ClearAll() method is obsolete.
Sitecore.Caching.CacheManager.ClearAllCaches();
Just a quick note, in Sitecore 6.3, that is not needed anymore. Caches are being cleared automatically after a change happens on a remote server.
Also, if you are on previous releases, instead of clearing all caches, you can do partial cache clearing.
There is a free shared source component called Stager that does that.
http://trac.sitecore.net/SitecoreStager
If you need a custom solution, you can simply extract the source code from there.
I got this from Sitecore support. It clears all caches:
Sitecore.Context.Database = this.WebContext.Database;
Sitecore.Context.Database.Engines.TemplateEngine.Reset();
Sitecore.Context.ClientData.RemoveAll();
Sitecore.Caching.CacheManager.ClearAllCaches();
Sitecore.Context.Database = this.ShellContext.Database;
Sitecore.Context.Database.Engines.TemplateEngine.Reset();
Sitecore.Caching.CacheManager.ClearAllCaches();
Sitecore.Context.ClientData.RemoveAll();
Out of the box solution provided by Sitecore to clean caches (ALL of them) is utilized by the following page: http://sitecore_instance_here/sitecore/admin/cache.aspx and code behind looks like the following snippet:
foreach (var cache in Sitecore.Caching.CacheManager.GetAllCaches())
cache.Clear();
Via the SDN:
HtmlCache cache = CacheManager.GetHtmlCache(Context.Site);
if (cache != null) {
cache.Clear();
}