Communication errors with Amazon SWF - Ruby Flow -


we having issue new ruby-flow wrapper amazon swf.

the issue workflow , activity workers (several times hour) unable correctly communicate swf server. manifests in various ways:

  • workflows or activities fail register when workers new versions started
  • workflow or activity workers crash
  • activity workers finish task , error when reporting done, entire execution fails.

for worker crashes (either kind), see following:

andy@andy-mbp:crucible $rails_env=development rake crucible:swf:ingress_wf_start rake aborted! execution expired /users/andy/.rvm/gems/ruby-1.9.3-p448@rails3/gems/aws-sdk-1.11.1/lib/aws/core/http/connection_pool.rb:301:in `start_session' /users/andy/.rvm/gems/ruby-1.9.3-p448@rails3/gems/aws-sdk-1.11.1/lib/aws/core/http/connection_pool.rb:125:in `session_for' /users/andy/.rvm/gems/ruby-1.9.3-p448@rails3/gems/aws-sdk-1.11.1/lib/aws/core/http/net_http_handler.rb:52:in `handle' /users/andy/.rvm/gems/ruby-1.9.3-p448@rails3/gems/aws-sdk-1.11.1/lib/aws/core/client.rb:238:in `block in make_sync_request' 

when failure involves failing update server task finished, backtrace pretty similar.

this doesn't seem swf issue per se (that is, it's not timeout on activity execution); it's ruby http communication issue. there similar issues on communicating twitter api.

again, it's not issue swf timeout expiring; workflow has timeout of day , each activity has timeout of hour. failures occur within boundary.

unfortunately, works, can start workflow executions, sort of error enough that cannot finish other trivial jobs. errors random enough troubleshooting extremely difficult.

we have reproduced on different machines , different networks. we're still trying out swf in development, none of failing workers located on ec2 instances.


is there underlying cause should investigate?
there pattern or setting allow me retry these communications?

i have discussed issue @ length amazon person maintains ruby-flow library. discussion may found here.

the issue our computers unable open connection aws servers. setting retry count obscenely high has addressed issue our development purposes.

the root cause seems in mac osx. use development , saw time out issue on several different machines (running 10.7 , 10.8).

a fresh linux machine did not exhibit problem.


Comments

Popular posts from this blog

html - How to style widget with post count different than without post count -

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

javascript - storing input from prompt in array and displaying the array -