AWS SDK in Perl - web-services

Whether it is possible to create AWS SDK in Perl?. I need to use AWS transcoder service from my perl script. But I wonder AWS SDK is not available for Perl(http://aws.amazon.com/code). Or do they have any other method to use PHP SDK in a Perl script?

The API is just "sending specific things over HTTP". You don't need a language specific library for that, although it does make things easier. Anyone can write such a wrapper, and some people already have done that for Perl.

Years later, there is now Paws, a Perl AWS interface. It's on CPAN.

It's fairly easy to write your own Perl modules to work with the AWS API. As remarked above, if you can make HTTP calls and create an HMAC signature, any language can do it.
However, there are already a lot of Perl modules on CPAN that address specific AWS functions, such as S3 or EC2. Go to http://www.cpan.org/src/ to search for what you need (e.g., SNS). You'll generally find something that will meet your need.

http://www.timkay.com/aws/
I have found Tim Kay's "aws" and "s3" tools quite useful. They are written in Perl.
It has the added advantage of --exec, so you can append commands directly to the output, in their original state from AWS. It has been a terror for me to have international characters and other junk floating about as sad excuse for file names. With Tim's toolset, I was able to workaround the problem by using the --exec to call for the prefix of the filename (also unique) and then act upon it directly, instead of mucking about with metacharacters and other nonsense.
For example:
/123/456/789/You can't be serious that this is really a filename.txt
/123/456/901/Oh!Yes I can! *LOL* Honest!.txt
To nuke the first one:
aws ls --no-vhost mybucketname/123/456/789/ --exec='system "aws", "rm", "--no-vhost", "$bucket/$key"'
Simply put, the tool performs an equivalent "ls" on the S3 bucket, for that prefix, and returns ALL file names in that prefix, which are passed into the exec function. From there, you can see I am blindly deleting whatever files are held within.
(note: --no-vhost helps resolve bucketnames with periods in them and you don't need to use long URLs to get from point a to point b.)

Related

Working with strings in GCP-Workflows and GCP-Admin

I'm integrating a project in GCP-Workflows with GCP-Admin, but I'm having trouble working with some data, when extracting a date it is delivered in this format: 2020-12-28T11: 20: 05.000Z, so I can't turn the string into int, and apparently there is no function in GCP like substring() either. I need to use the date with an IF, checking if it is greater or less than the reference.
How can I do this?
There is some lack of function implementation for now in Workflows. New ones are coming very soon. But I don't know if they will solve your problem
Anyway, with workflows, the correct pattern, if a built-in function isn't implemented, is to call an endpoint, for example a Cloud Function or a Cloud Run, which perform the transformation for you and return the expected result.
Quite boring to do, but don't hesitate to open feature request on the issues tracker product team is very reactive and love user feedbacks!
The Workflows standard library now includes a text module with functions for searching (including regular expressions), splitting, substrings and case transformations.

AWS Tools for Powershell, version differences

I have been testing an older AWS Tools install using AWSToolsAndSDKForNet_sdk-3.3.398.0_ps-3.3.390.0_tk-1.14.4.1.msi and a newer install using AWSToolsAndSDKForNet_sdk-3.5.2.0_ps-4.1.0.0_tk-1.14.5.0.msi. The code that I am using to test with is
Set-AWSCredential -AccessKey:$ACCESSKEY -SecretKey:$SECRETKEY -StoreAs:default
$items = Get-S3Object -BucketName:$BUCKETNAME -Region:'eu-west-1' -Key:'revit/2020'
Write-Host "$($items.Length) items"
$count = 1
foreach ($item in $items) {
Write-Host "$count $($item.key)"
$count ++
}
I am seeing VERY different behavior, and can't figure out why. With 3.3 the code works as intended, I end up with a list of files in my bucket and key. Performance is pretty decent, it takes a moment but I have about 5000 files in may "subfolders".
When I run this with 4.1 it takes 3-5 times as long and returns nothing.
It seems that Help is a bit different too. A first run of get-help Get-S3Object -detailed will take as long as 10 minutes to run, with CPU, Memory and Disk access often at 99% utilization. A second run is quite quick. 3.3 Does nothing of the sort.
So, is this current build of AWS Tools for Powershell just not ready for prime time? My searches for AWS Tools 4.1 performance have turned up nothing.
For what it is worth, I am using the MSI installer because I need the install to actually work consistently, and the NuGet approach has been very problematic on a number of production workstations. But if there is another option I would love to look at it. The main issue is I need ultimately to do the install and immediately load the modules and work with AWS. I don't have that working with the MSI based install yet, but that's for a different thread.
It looks like they changed the results from Get-S3Object. You will need to add -Select S3Objects.Key to get the results you're looking for (or just -select *). Here's the excerpt from the change notes:
Most cmdlets have a new parameter: -Select. Select can be used to change the value returned by the cmdlet. For example the service API used by Get-S3Object returns a ListObjectsResponse object but the cmdlet is configured to return only the S3Objects field. Now you can specify -Select * to receive the full API response. You can also specify the path to a nested result property like -Select S3Objects.Key. In certain situations it may be useful to return a cmdlet parameter, this can be achieved with -Select ^ParameterName.
Found by going to the Change Notes and doing a CTRL+F for Get-S3Object. Hope this resolves it for you!

Creating QR Codes with ColdFusion

Is there any way with pure ColdFusion/cfscript to produce a QR code, without relying on external APIs or JavaScript?
No. ColdFusion cannot generate bar codes by itself. You need a separate tool or library. It is easy enough to install a java library, like ZXing. Then generate the images from CF. Alternately, you could do a <cfhttp> call to an external server that generates the bar code image for you, or basically do the same thing with javascript. You would not need to install anything for the latter two (2) options. But they still rely on an external resource.
Bottom line you need something more than just ColdFusion. What is the reason you cannot use either an external API or javascript? Because without either of those, you are probably out of luck.
Edit based on comments:
If the only restriction is the images must generated locally, then you can use ZXing as described in the link above -OR- any of the other components/libraries mentioned in the other responses, like Joe's suggestion which uses iText (though also based on ZXing).
Some other external APIs
http://cfbarbecue.riaforge.org/
http://zanstra.com/my/Barcode.html?barcode=3PTSP8827A231
If you really wanted to, you could look up (perhaps you need to buy?) the encoding standard for QR codes, which I believe is an ISO standard. Then you could write a program which would output a table with the appropriate number of rows and columns, each with either a black or a white background. I wouldn't recommend this form of "rolling your own" though; it's a lot of work to do essentially what's been done before.
Tim Cunningham wrote a library that is hosted on Github that utilizes iText that does just this very thing. https://github.com/boltz/QRToad

Google Bot information?

Does anyone know any more details about google's web-crawler (aka GoogleBot)? I was curious about what it was written in (I've made a few crawlers myself and am about to make another) and if it parses images and such. I'm assuming it does somewhere along the line, b/c the images in images.google.com are all resized. It also wouldn't surprise me if it was all written in Python and if they used all their own libraries for most everything, including html/image/pdf parsing. Maybe they don't though. Maybe it's all written in C/C++. Thanks in advance-
you can find a bit about how googlebot works here:
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=158587
for example the "fetch as googlebot" tool lets you see a page as Googlebot sees it.
The crawler is very likely written in C or C++, at least backrub's crawler was written in one of these.
Be aware that the crawler only takes a snapshot of the page, then stores it in a temporary database for later processing. The indexing and other attached algorithms will extract the data, for example the image references.
Officially allowed languages at Google, I think, are Python/C++/Java.
The bot likely uses all 3 for different tasks.

RTF / doc / docx text extraction in program written in C++/Qt

I am writing some program in Qt/C++, and I need to read text from Microsoft Word/RTF/docx files.
And I am looking for some command-line program that can make that extraction. It may be several programs.
The closest thing I found is DocToText, but it has several bugs, so I can't use it.
I have also Microsoft Word installed on the PC. Maybe there is some way to read text using it (have no idea how to use COM)?
Now, this is pretty ugly and pretty hacky, but it seems to work for me for basic text extraction. Obviously to use this in a Qt program you'd have to spawn a process for it etc, but the command line I've hacked together is:
unzip -p file.docx | grep '<w:t' | sed 's/<[^<]*>//g' | grep -v '^[[:space:]]*$'
So that's:
unzip -p file.docx: -p == "unzip to stdout"
grep '<w:t': Grab just the lines containing '<w:t' (<w:t> is the Word 2007 XML element for "text", as far as I can tell)
sed 's/<[^<]>//g'*: Remove everything inside tags
grep -v '^[[:space:]]$'*: Remove blank lines
There is likely a more efficient way to do this, but it seems to work for me on the few docs I've tested it with.
As far as I'm aware, unzip, grep and sed all have ports for Windows and any of the Unixes, so it should be reasonably cross-platform. Despit being a bit of an ugly hack ;)
Try Apache Tika
I recommend not to use COM as this would defeat the usage of a portable library like Qt in the first place.
You might want to use the classic catdoc or a similar tool such as wvWare.
Note that although the catdoc author claims that catdoc doesn't work under Windows, there is a posting of 2001 which states the opposite.
To read .doc files you can use the structured storage API. A .doc is basically a structured storage repository with various streams corresponding to the various parts of the document.
Be warned that it is quite a hairy API and that even using this API, a .doc file can be quite messy to look at.
Ofcouse this is still windows only but atleast it's not COM. just a plain old C API.
This might help. It is cross-platform and has an API http://www.winfield.demon.nl/
Otherwise the iFilter methods are the way to go if this is windows only. It will allow you to parse anything that has an iFilter on your system. Here is examples of this http://the-lazy-programmer.com/blog/?p=8 . I have used iFilter from the C# end of things quite a bit.