Skip to content

Problems managing cluster connections in production [JIRA: CLIENTS-697] #121

@qix

Description

@qix

We've been using the Riak in production for around six months now and have had a ton of issues, primarily around connection handling.

Our system usually requires values stored in riak in a very burst-y fashion, requesting up to +/- 500 keys within a few milliseconds. We were shocked to realize that each get request required it's own connection, and that the defaults maxConnections was set to 10000. This essentially means whenever we requested the keys each key would open it's own connection and overwhelm the riak servers [keep in mind this is happening on 50-100 boxes at similar times.]

In an ideal world we would open a connection to each riak server, send all the commands down them in round-robin fashion and then wait for responses. I understand the protocol requires a roundtrip for every request right now which is its own problem -- I'm not sure if there is anything in the pipeline to solve that.

The logic in the queueCommands is useless to us as it would either create a whole lot of cpu load creating 500 timeouts repeatedly. It also takes (N [requests] / M [connections]) * T [queueSubmitInterval] time. With twenty connections our five hundred gets would take 10+ seconds to fetch, and that's ignoring the speed/latency of the actual riak servers. Yes we could drop queueSubmitInterval, but dropping it low then causes a ton of cpu burn creating useless timers.

I know this was a bunch of complaints and not much in the line of solutions... we're actually looking at switching datastore for our simpler "write once" key-value requests which will alleviate most of the load. As a stop-gap we've implemented a super simple RiakCluster on our end which creates a bunch of RiakClient's and load balancers them properly.

Some suggestions that would help a ton:

  • Drop the maxConnections default to something more sane. Perhaps 100?
  • Get rid of queueSubmitInterval, and instead have a list of waiting commands that get popped whenever a node is free.
  • Update the protocol so that multiple requests can be sent down a single connection [not likely - I know.]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions