Our process was pretty straight forward. We (ok, I) picked memcache for three reasons:
1) Automatic expiration of records means I don't have to do my own garbage collection.
2) AWS offered ElastiCache which was built off of Memcache and could allow us to move to Couchbase if needed (it's not)
3) Our Ops folk suggested Memcache because they've had the least number of problems dealing with it.

Since I started working on things, a few other things have happened:
1) AWS now offers a managed Redis service
2) AWS became a bit more reliable in a subsequent release. (Haven't done a lot of rigorous checks there, since most of the problems come about because Redis REALLY doesn't like it when memory gets full.)
3) Some of the storage requirements changed.

Ah, the joys of initial implementations of a service. This is why abstraction is a dear, sweet friend of mine.

Honestly, the thing I'm having the hardest time trying to work out is the optimal way that records should be stored and accessed in the various data stores. As anyone with a link to http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html can tell you, pulling stuff off a wire is one of the worst things you can do. So, the question is "Do you pull lots of data in one block, or lots of blocks of little data?" Either way, there's going to be a factor of inefficiency. (e.g. someone who has a lot of CHIDs with only a one or two very active CHIDs would have a pretty good sized block of extra data pumped back and forth in the case of a single record organized by UAID, vs. someone with a lot of very active CHIDs having to do lots of little updates.)

I'm not really going to try to address it right now, since it's premature optimization without actual use metrics, and there be both dragons and hairy yaks there, but I tend to like having options on hand for when things start to go wonky.

I'll be interested to see what y'all pick, since your constraints and benefits aren't going to be the same.

On 2013/9/25 5:16 AM, Daniel Bevenius wrote:
To get a feel for what would be involved to use a key/value datastore we've done some experimenting with Redis[1]. There might be other non-relational databases more suited or perhaps Redis is a good choice for this, I don't know. But I think we should decide if this is worth pursuing and in that case what database to use before spending more time on this.
Let us know what you think.

[1] https://gist.github.com/danbev/6606289#using-redis-as-a-data-store


On 19 September 2013 18:08, JR Conlin <jrconlin@gmail.com> wrote:
On 2013/9/19 5:18 AM, Lucas Holmquist wrote:

On Sep 19, 2013, at 12:34 AM, Daniel Bevenius <daniel.bevenius@gmail.com> wrote:

>I wonder what kind of numbers would we get by ditching JPA completely and using a non-relational DB like Redis
Yeah, I think we will most likely need to if we want to come close to the other implementations performance wise. Others use Memcache and I've seen MongoDB in use as well. 

Perhaps I should just add performance tests for the rest of the SimplePush operations so that we have them covered and then look into using a non-relational DB. Once that is done we can revisit this performance task.
What do people thing about that?

+1,  relational DB's are dinosaours
Hardly. It's just a question of what the right tool for a given job is. (I'll note that Google is spending quite a bit of time and effort improving Maria because they use a LOT of relational DBs for very large data.

In this case, however, it's pretty easy to reduce things to simple key/value. I picked Memcache, partly because of the fact that it does record auto-expiration, which means that I don't have to do garbage collection on uncollected records. If you switched to an alternate schema (such as keeping a single record per UAID that contained all the CHID data as well as stuff like the proprietary info or other crap), you could even use simple flat files and skip the DB requirement altogether.

We were kicking the idea around of only storing undeliverable data into the DB, and instead letting each websock connector deal with managing it's own data. For our implementation, I've already preferenced delivery over storage for connected clients and seen a fair bit of improvement on delivery. (Remember, SimplePush is not a 100% guaranteed delivery system, so please avoid using it for nuclear reactor management or pacemakers.)

We'll probably hold off on doing further memory refinement until we get some actual use data, but I like having options available.





On 19 September 2013 06:03, Bruno Oliveira <bruno@abstractj.org> wrote:
Hmmm tempting idea :)

> On Sep 19, 2013, at 12:23 AM, Douglas Campos <qmx@qmx.me> wrote:
>
> That's a nice report!
>
> I wonder what kind of numbers would we get by ditching JPA completely
> and using a non-relational DB like Redis...
>
> --
> qmx
> _______________________________________________
> aerogear-dev mailing list
> aerogear-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/aerogear-dev
_______________________________________________
aerogear-dev mailing list
aerogear-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/aerogear-dev

_______________________________________________
aerogear-dev mailing list
aerogear-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/aerogear-dev



_______________________________________________
aerogear-dev mailing list
aerogear-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/aerogear-dev


_______________________________________________
aerogear-dev mailing list
aerogear-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/aerogear-dev



_______________________________________________
aerogear-dev mailing list
aerogear-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/aerogear-dev