Can Amazon Simple Workflow (SWF) be made to work with jRuby? - amazon-web-services

For uninteresting reasons, I have to use jRuby on a particular project where we also want to use Amazon Simple Workflow (SWF). I don't have a choice in the jRuby department, so please don't say "use MRI".
The first problem I ran into is that jRuby doesn't support forking and SWF activity workers love to fork. After hacking through the SWF ruby libraries, I was able to figure out how to attach a logger and also figure out how to prevent forking, which was tremendously helpful:
AWS::Flow::ActivityWorker.new(
swf.client, domain,"my_tasklist", MyActivities
) do |options|
options.logger= Logger.new("logs/swf_logger.log")
options.use_forking = false
end
This prevented forking, but now I'm hitting more exceptions deep in the SWF source code having to do with Fibers and the context not existing:
Error in the poller, exception:
AWS::Flow::Core::NoContextException: AWS::Flow::Core::NoContextException stacktrace:
"aws-flow-2.4.0/lib/aws/flow/implementation.rb:38:in 'task'",
"aws-flow-2.4.0/lib/aws/decider/task_poller.rb:292:in 'respond_activity_task_failed'",
"aws-flow-2.4.0/lib/aws/decider/task_poller.rb:204:in 'respond_activity_task_failed_with_retry'",
"aws-flow-2.4.0/lib/aws/decider/task_poller.rb:335:in 'process_single_task'",
"aws-flow-2.4.0/lib/aws/decider/task_poller.rb:388:in 'poll_and_process_single_task'",
"aws-flow-2.4.0/lib/aws/decider/worker.rb:447:in 'run_once'",
"aws-flow-2.4.0/lib/aws/decider/worker.rb:419:in 'start'",
"org/jruby/RubyKernel.java:1501:in `loop'",
"aws-flow-2.4.0/lib/aws/decider/worker.rb:417:in 'start'",
"/Users/trcull/dev/etl/flow/etl_runner.rb:28:in 'start_workers'"
This is the SWF code at that line:
# #param [Future] future
# Unused; defaults to **nil**.
#
# #param block
# The block of code to be executed when the task is run.
#
# #raise [NoContextException]
# If the current fiber does not respond to `Fiber.__context__`.
#
# #return [Future]
# The tasks result, which is a {Future}.
#
def task(future = nil, &block)
fiber = ::Fiber.current
raise NoContextException unless fiber.respond_to? :__context__
context = fiber.__context__
t = Task.new(nil, &block)
task_context = TaskContext.new(:parent => context.get_closest_containing_scope, :task => t)
context << t
t.result
end
I fear this is another flavor of the same forking problem and also fear that I'm facing a long road of slogging through SWF source code and working around problems until I finally hit a wall I can't work around.
So, my question is, has anyone actually gotten jRuby and SWF to work together? If so, is there a list of steps and workarounds somewhere I can be pointed to? Googling for "SWF and jRuby" hasn't turned up anything so far and I'm already 1 1/2 days into this task.

I think the issue might be that aws-flow-ruby doesn't support Ruby 2.0. I found this PDF dated Jan 22, 2015.
1.2.1
Tested Ruby Runtimes The AWS Flow Framework for Ruby has been tested
with the official Ruby 1.9 runtime, also known as YARV. Other versions
of the Ruby runtime may work, but are unsupported.

I have a partial answer to my own question. The answer to "Can SWF be made to work on jRuby" is "Yes...ish."
I was, indeed, able to get a workflow working end-to-end (and even make calls to a database via JDBC, the original reason I had to do this). So, that's the "yes" part of the answer. Yes, SWF can be made to work on jRuby.
Here's the "ish" part of the answer.
The stack trace I posted above is the result of SWF trying to raise an ActivityTaskFailedException due to a problem in some of my activity code. That part is my fault. What's not my fault is that the superclass of ActivityTaskFailedException has this code in it:
def initialize(reason = "Something went wrong in Flow",
details = "But this indicates that it got corrupted getting out")
super(reason)
#reason = reason
#details = details
details = details.message if details.is_a? Exception
self.set_backtrace(details)
end
When your activity throws an exception, the "details" variable you see above is filled with a String. MRI is perfectly happy to take a String as an argument to set_backtrace(), but jRuby is not, and jRuby throws an exception saying that "details" must be an Array of Strings. This exception blows through all the nice error catching logic of the SWF library and into this code that's trying to do incompatible things with the Fiber library. That code then throws a follow-on exception and kills the activity worker thread entirely.
So, you can run SWF on jRuby as long as your activity and workflow code never, ever throws exceptions because otherwise those exceptions will kill your worker threads (which is not the intended behavior of SWF workers). What they are designed to do instead is communicate the exception back to SWF in a nice, trackable, recoverable fashion. But, the SWF code that does the communicating back to SWF has, itself, code that's incompatible with jRuby.
To get past this problem, I monkey-patched AWS::Flow::FlowException like so:
def initialize(reason = "Something went wrong in Flow",
details = "But this indicates that it got corrupted getting out")
super(reason)
#reason = reason
#details = details
details = details.message if details.is_a? Exception
details = [details] if details.is_a? String
self.set_backtrace(details)
end
Hope that helps someone in the same situation as me.

I'm using JFlow, it lets you start SWF flow activity workers with JRuby.

Related

AWS Translate: Get DetectedLanguageCode from DetectedLanguageLowConfidenceException

I'm playing around with AWS Translate a bit. I want AWS Translate to auto-detect the source language, when I send a TranslateTextAsync request. Apparently, there can be a DetectedLanguageLowConfidenceException, which I want to handle by getting the DetectedLanguageCode from the exception and retry the translation. I was not able to get this exception to occur, so I don't know the structure of that response exception.
For the Java SDK, I found that there is a "getDetectedLanguageCode" function, but this one doesn't exist in the .NET SDK. I'm using AWSSDK.Translate v3.3.101.12.
How do I get the language code from the DetectedLanguageLowConfidenceException?
I contacted AWS Support and they reached out to their AWS Translate team. They write that
C#/.Net does not support member variables in exceptions the way Java does. However, supplementary information about exceptions is stored in the Data dictionary of the exception
They also mention that AWS Translate will usually use even a low confidence guess before throwing a DetectedLanguageLowConfidenceException, so it seems like we don't really have to worry about it.
I still went and implemented the exception handling and have the following code to extract the detected language code data. This code is untested though:
catch (DetectedLanguageLowConfidenceException ex)
{
var dictionary = ex.Data as Dictionary<object, object>;
var detectedLanguageCode = dictionary?["DetectedLanguageCode"] as string;
// Retry here with the detected low confidence language code.
}

Selenium Python Script, InvalidElementStateException

So I'm getting this error every so often when running the exact same test.
StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=69.0.3497.100)
(Driver info: chromedriver=2.41.578706 (5f725d1b4f0a4acbf5259df887244095596231db),platform=Mac OS X 10.12.6 x86_64)
The only problem is that it seems to happen inconsistently to different areas of the code. It's when trying to access DOM elements, like a search field, of my ReactJS page. I'm running this through ROBOT Automation Framework, using a mix of the SeleniumLibrary and a custom library.
I've read that it's simply as it sounds, the xPath as become outdated on the DOM, but that doesn't help me figure out why it's an inconsistent error happening almost anywhere at any point.
EDIT: It seems to be happening in this:
def filter_modality(self, filter):
filter_value = '//span[#title="{}"]//parent::li'.format(filter)
self.selib.click_element(filter_locator)
self.selib.wait_until_page_contains_element('//*[#class="multi-selector-options open"]')
self.selib.wait_until_element_is_visible(filter_value)
self.selib.click_element(filter_value )
self.selib.wait_until_page_contains_element('//div...[#class="option selector-item active"]',
error=("Could not select filter: {}".format(filter_value )))
#I get the stale element message from or after executing this click_element
self.selib.click_element(filter_locator)
self.selib.wait_until_page_does_not_contain_element('//*[#class="multi-selector-options open"]',
error="Filter dropdown did not disappear after selection")
The exception comes when SE has found an element, but shortly afterwards something (a JS function) has changed it - removed from/rearranged it in the DOM, or changed its attributes significantly, so it's no longer tracked as the same element.
This comes for the fact that SE builds an internal cache of the DOM, which may become desynchronized from the actual one; thus the name "stale" - the el. is cached in some state, but its actual form is now different.
The issue is that common, that there is a specific SO tag for it - https://stackoverflow.com/questions/tagged/staleelementreferenceexception (I myself was surprised from this).
The common solutions are:
sleep for some seconds before an event you know will cause the issue
re-get an element before its usage (if you store a reference to it in a WebElement object, not really the case with robotframework)
have a rertry mechanism on working with an element that you know may cause the execution
swallow the exception and move along (I've done it in a few places, where the element was just a confirmation an operation was executed - it has been shown/found by SE once, afterwards I don't care is it changed in the DOM)

Problems with MSF4J and #MatrixParam

Folks, I have found what seems to be a problem with / (bug in ?) MSF4J as including an #MatrixParam annotated variable in a URI causes the affected (micro)service to either 'hang' indefinitely, or if accessed via a browser, to give a "404 Not Found" message for the path/endpoint, even when correct.
Here is a code fragment that illustrates the problem - it compiles ok (eclipse/maven) and deploys without errors using microservicesrunner() in the usual way.
package org.test.service;
import javax.ws.rs.GET;
import javax.ws.rs.MatrixParam;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import javax.ws.rs.core.Response;
#Path("books")
public class MPTest { // MatrixParam Test
#GET
#Produces(MediaType.TEXT_PLAIN)
#Path("/query")
// method to respond to 'GET' requests
public Response getListOfBooks(#MatrixParam("Author") String author) {
// do something in here to get book data from DB and sort by titles
List<String> titles = .......;
return Response.status(200) .entity("List of Books by " +author+ "ordered by title " + titles).build();
}
}
With this code fragment, accessing the URL "(host:8080)/books/query;Author=MickeyMouse" should cause a list of books by that author to be retrieved from the DB (I have omitted the actual code that does so for clarity, as it is not relevant to this post).
However, it does not get there, so that code isnt executed. As far as I can tell with a debugger, no #MatricParam value is retrieved - it remains null until the process times out. Things like curl and wget just hang until they time out, and from a browser, the best I can get is a 404 not found error for the URI, even though it is valid.
However, if I replace the #MatrixParam with a #PathParam it works perfectly, and can I get the URL string retrieved in its entirity. The URI that I get is as expected - no odd hex characters, no typos, and so forth. The URI entered is what you get back. So, no problem there.
Behaviour is also consistent across platforms (couple of flavours of Linux, and three versions of Windoze), so it is not anything to do with the OS itself. Similarly, I get the same behavior with multiple clients and tools, so it isnt a problem there either.
So, it appears to be a problem within the MSF4J framework / domain, and I could use some support / help / suggestions here as I've reached the point of tearing my hair out..... Any ideas, folks?
The only reference I can find to a similar problem was closed as 'off topic' without a reply (see Rest API Matrix param annotation) so I think that this needs re-opening as it seems to be a genuine problem....
Regards, and thanks in advance for any help,
Rick
#MatrixParam is not supported with MSF4J at the moment. You can create a GitHub issue. So we can implement that support in future releases.

Rails - run pusher in background

I use Pusher in my Rails-4 application.
The problem is that sometimes the connection is slow, so the execution of the code becomes slower.
I also get from time to time the following error:
Pusher::HTTPError: execution expired (HTTPClient::ConnectTimeoutError)
I send signals via Pusher with this code:
Pusher[channel].trigger!(event, msg)
I would like to execute it in background, so if an exception is thrown it will not break the flow of my app, and neither slow it down.
I tried to wrap the call with begin ... rescue but it didn't solve the exception problem. Of course even if it would, it wouldn't solve the slow-down problem i want to avoid.
Information on performing asynchronous triggers can be found here:
https://github.com/pusher/pusher-gem#asynchronous-requests
This also provides you within information on catching/handling errors.
Finally I implemented this solution:
Thread.new do
begin
Pusher[channel].trigger!(ch, ev, msg)
ActiveRecord::Base.connection.close
rescue Pusher::Error => e
Rails.logger.error "Pusher error: #{e.message}"
end
end

CF extended components suddenly stop working

We have a set of Coldfusion applications that all extended various parts of an application base. I'll provide a bit of code and then explain the issues we are having and see if anyone can shed light on the best way to trouble shoot this:
In our "OnRequestStart" in the app.cfc we have the following line to initiate a user:
if(!structKeyExists(SESSION, 'user'))
SESSION.user = CreateObject("component","cfcs.ds_user");
Then in the ds_user.cfc we call it like so:
component extends="cas.cas_user" displayname="basic_user"{
The application and all its parts run just like they should. However, in a seeming random manner after a while, the application will crash and I have to restart ColdFusion Service to get it running again. The error I get is:
Could not find the ColdFusion component or interface cas.cas_user.
So, for whatever reason after a while, my application decides it cannot find the path to the parent component. The mapping for that cfc is in the application.cfc at the top as so:
THIS.mappings["/cas"] = "#ReplaceNoCase(currpath,ListToArray(THIS.name,'_')[1],'cas30')#assets\cfcs\";
I want to be sure to say this, the application works perfectly as designed for a random amount of time and then it cannot find the parent component and will not find it again until I restart the ColdFusion Service on the server.
I figure this is somehow a memory leak or something, but I have no idea where to start looking to troubleshoot the issue. We have 6 or so other applications that are extended in the same way and work fine and never crash, but this one does.
EDIT: To be more clear on the mappings. Our applications are located:
root.com/app1
root.com/app2
We created mappings to grab cfcs from app2 while in app1 using the method above. The method, while I believe sort of strange, does work in all of our applications.
EDIT: The correct mappings that display for a while are:
/cfcs - D:\www\app1\assets\cfcs
/templates - D:\www\app1\assets\templates
/cas - D:\www\app2\assets\cfcs
/common - D:\www\app3\assets\common_elements
However once the Application goes in "crashed mode", the dump reveals the mappings are as follows:
/cfcs - D:\www\app1\assets\cfcs
/templates - D:\www\app1\assets\templates
/cas - D:\www\app1\assets\cfcs
/common - D:\www\app1\assets\common_elements
And here is how those mappings are defined at the start of the Application.cfc:
currpath = GetDirectoryFromPath(GetCurrentTemplatePath());
THIS.mappings["/templates"] = "#currpath#assets\templates";
THIS.mappings["/cfcs"] = "#currpath#assets\cfcs";
THIS.mappings["/common"] = "#ReplaceNoCase(currpath,ListToArray(THIS.name,'_')[1],'gum')#assets\common_elements\";
THIS.mappings["/cas"] = "#ReplaceNoCase(currpath,ListToArray(THIS.name,'_')[1],'cas30')#assets\cfcs\";
THIS.name = digisign_CAAAFACBFDFFE or
name_var = (arrayLen(meta_array) >= 2) ? meta_array[arrayLen(meta_array) - 1] & '_' : 'root_';
THIS.name = name_var & right(reReplace(hash(getCurrentTemplatePath()), "[^a-zA-Z]","","all"), 64 - len(name_var));
Where could it be failing. It seems the replace statement isn't working and therefore the appname in the path is not being changed from app1 to app2 when setting the mappings. is it possible this is related to this error we are currently working through: http://forums.adobe.com/message/4657868#4657868 We have yet to apply the Update 4 patch on production. However this problem we believe was happening before CF10. And while we have this issue, it only cropped up recently. This application in question has been crashing like this for a long time.
EDIT:
1. I guess when I say "crash" I mean the application gets into a state, where it will not declare the mappings correctly until I restart Coldfusion. I assume the error in our code causes the crash.
2.This is usually where the issue occurs, when doing this check of the SESSION.user var. I believe it has happened as well, it decides it cannot find our datasource. This is rare.
3. At first I thought yes, but actually no, not that many. Throughout our apps we have several names for common mappings. cas common cfcs templates etc. However D:\www\cas is where the application domain.com/cas30 is located. However a legacy version of that app is located at domain.com/cas. The mapping /cas should go to D:\www\cas30\assets\cfcs and works.
4.We have a dev setup and this never happens. (I assumed it was a load issue which is why it doesn't happen on dev). However, our dev environment is structred as so:
D:\www\deva\app1
D:\www\deva\app2
D:\www\devb\app1
D:\www\devb\app2
What we do (which I think is stupid) is we have a file located not in the same dir as the current app. This file is called application_base.cfc. All of the application.cfcs in the other applications extend from this application_base.cfc. They are not extended from other Application.cfc files. (hope that makes sense) In application_base is a init, onrequeststart, and an onerror. I'll post the App.cfc below. Also some setting are read from XML files both in the application base (to determine environment stuff) and at the application level. However we thought that might be causing the issue so the previous developer removed the xml file at the application level.
6.Yes. I'll post the app.cfc and the appbase.cfc so you can view both.
By reinitialize you mean call onapplicationstart or something. Not that I know of.
A few applications we have do:
currpath = GetDirectoryFromPath(GetCurrentTemplatePath());
app_path = ListToArray(currpath,'\');
THIS.name = app_path[ArrayLen(app_path)];
This one does:
meta_array = ListToArray(GetMetaData(this).name,'.');
name_var = (arrayLen(meta_array) >= 2) ? meta_array[arrayLen(meta_array) - 1] & '_' : 'root_';
THIS.name = name_var & right(reReplace(hash(getCurrentTemplatePath()), "[^a-zA-Z]","","all"), 64 - len(name_var));
A few others do this as well. Not sure if it was two different developers or something, but that is the way it is.
Once the app fails, it fails until I restart coldfusion. The app requires login from the domain.com/app page, so (not saying it cant change from request to request) but the request location is always the same where it's failing.
God I wish it wasn't this complex. I recently pulled our current CMS off of alot of this crazy stuff, but we have 7 or 8 applications that are so intertwined with each other and designed to work in dev/prod environments with different paths, its sometimes hard to tell what I can remove and what I can't.
I thought I tried dumping the applicationname from our error handler, but I thought it didn't work unless passed in. I passed through the mappings so I could see them which is how I know digisign is not changing to cas30 like it should in "crash" mode.
I think all the dynamic mappings were so the original developer could just use the same app.cfc template without changing anything. He liked to do stuff like var a = (b) ? (a-c) ? a-f+b : (a+b) ? d : d; : a; h; crap with no comments so it sometimes hard to just read the damn code let alone debug it.
EDIT
I feel like this issue and stackoverflow.com/q/14300915/1229594 issue may be related. I've posted some more details here as well: forums.adobe.com/message/5022377#5022377
First things first: why are you initialising session-oriented stuff in onREQUESTStart()? If you inited that in onSessionStart(), you'd not need to check for its existence every request, which - whilst trivial - is unnecessary overhead, and is simply the wrong code in the wrong place.
Secondly... you quote your error, but don't say where it's happening. Is it happening in that line in onRequestStart()?
If so, do me a favour: put a try/catch around it, and within that write the value of this.mappings to a log file, as well as the value of currPath. How is the value of that variable being derived, btw?
That said, I think if you just put that session.user init code in the right place, it'll solve your problem.
NB: frame this problem as almost certainly not a memory leak (ie: ColdFusion's fault), but your code doing something you did not anticipate (so... err... your fault ;-). This will help focus better on finding the problem. I'm not having a go at you, but "where is my code wrong" is a better approach than "it's probably something else". And more likely to be correct ;-)
Oh... and what version of CF are you on?
Take a look at this and see if it's relevant to your problem.
https://github.com/Mach-II/Mach-II-Framework/wiki/Application-Specific-Mapping-Workaround
If not, then it could have something to do with application specific mappings of the same name, on the same CF server, with those applications having different application names.
Some questions:
Are you assuming the crash is being caused by the code error, or that the code error is occurring because of the crash?
Is the instantiation of the session user the only line of code that you see these path errors?
Do you have any physical directories in your app that have the same name as the mapping names?
Does this occur in any other environments (dev/test)? Is this a clustered environment?
Are there multiple Application.cfc files extending this same Application.cfc?
Is there any code that is directly calling Application.cfc methods?
Are there any bits of code that cause the application to reinitialize itself?
What is determining the meta_array that is being used to derive the application name?
A few observations:
It seems to me that the application name is getting changed or that some other application is overwriting with the same name. This doesn't seem far-fetched as there's an awful lot of dynamic naming going on here. Starting with the application name, it's dependent on the current template's physical location, which could be different from request to request, depending on how the app routes requests. If the current template varies, the application name will vary, and cause the other app-specific mappings to change, which would cause a cascading effect to all the other mappings that use the app name to determine the physical location of those mappings.
Which begs the question: Why is all this dynamic evaluation of the application name and mapping locations even necessary? Can it be simplified or hard-coded? Can you instead use a server mapping? If it doesn't have to be this complex, simplifying it to its barest essentials will help troubleshooting and may clear up the issue entirely.
Finally, can you verify that the application name during normal operation is the same application name being referenced when the errors are occurring?
If they are different, then something is causing the application to execute within a different context (see my initial questions above for clues). A sudden change in the application name would invalidate any existing sessions and force the session user instantiation code to re-run. And because the user component paths are based in part on the application name, the paths may no longer be correct.
But if the application names are the same between normal operation and crash mode, then most likely the currpath variable is being affected by some part of the application being executed in a different physical path than expected. Since currpath is directly used in determining the rest of the mappings, that could certainly explain why an unexpected path could cause the component to go missing.
Because there are so many variables going into deriving these names, you would be well served to log those variables during normal operation and during crash mode. You'll want to see
GetCurrentTemplatePath()
GetDirectoryFromPath(GetCurrentTemplatePath())
THIS.name
meta_array
THIS.mappings
I suspect you'll find something significantly different in these variables when operating normally and when the crash/errors are occurring, and that difference should lead you closer to the answer.