Memcached vs. Ehcache (for Java)
First, lets look at simplicity to get up and running. I’ll assume you already have a java environment up and running using an IDE or text editor of your choice (eclipse plus viPlugin ftw) because you are a java developer and you need that. I’m also assuming you are starting out wanting to test both and have no idea other than heard from a friend of a friend:
Memcached steps (takes about 10 minutes or less depending on your internet connection):
- google memcached
- sudo apt-get install memcached; sudo /etc/init.d/memcached start
- google memcached java, download java client, read howTo
- add jar to classpath, copy paste client code example, modify for your needs
- build, run
Ehcache steps (takes from 10 minutes to 4 hours depending on your project and java framework skillz):
- google ehcache
- download, tar -xzf, add jars to classpath, or join 20th century, use your dep management software to add ehcache to your project (maven: add this group/artifactid/version to your project’s pom file)
- google ehcache example, realize you should use spring, google ehcache spring example
- get spring (or add it to your pom, hint google maven2 spring) and make your project use it, or realize you already have spring in your project and rejoice that you made a good decision.
- copy and modify ehcache example, realize that the example shows you how to use ehcache and to easily use AOP to cache any method call, yell “JACKPOT”.
- build, run, realize ehcache is using a ‘failsafe’ config which doesn’t sound good, but works for testing
- see the ehcahe.xml in your exploded tar.gz, modify it and add it to the classpath cause you like things running without warnings
Winner: memcached. Unless you already know maven and spring and have them running in your project, you are looking at a significant time and/or learning curve to get these things set up (although both are optional). Memcached was super easy, a junior dev could add it to the project. Ehcache was not too hard, but you wouldn’t want a junior dev adding spring and maven to your project.
Next lets look at performance and features. Remember we are using java, so we have a couple of things to think about. First, our JVM will run out of memory so we have to have a caching solution that can use secondary storage outside of the JVM. Second, we are using caching more sophisticated than in memory maps, so our project may eventually need to be clustered/distributed for scaling, so our cache must do that too. Next, It would be nice to be able to do extra fancy stuff like expire things, monitor the lifecycle of cached objects, and monitor cache effectiveness based on hits and misses. Finally, the whole point of a cache is to speed things up, our cache needs to be speedy.
Memcached:
- Only uses secondary storage. Runs more like client/server so, never uses JVM memory, always goes over an HTTP connection (more on that later).
- Because its client/server, it scales very easily, it also is very simple to configure multiple servers for failover, distributed load, etc. Its actually as easy as staring it on another machine and adding that machine to your client config.
- Memcached may or may not do some fancy things, but all our client can do is basic put, get, remove, clear, and get a map of stats.
- Remember how memcached is client/server? That adds overhead for serialization and your transport protocol. This makes it slower. Some say as slow as querying mysql
Ehcache:
- Uses JVM memory, then overflows to secondary storage based on config. configurable out the wazzoo for how it handles that.
- Scaling through distributed caches and replication has been added for several versions now. Its not as straight forward to set up (read: lots of xml editing and documentation), but can automatically find new nodes in the cluster through multicast, etc.
- The E H in Ehcache stands for Extremely <something that starts with H and means feature rich>. It can set expiry time based on last accessed, evict based on policies (LRU, FIFO, custom), register listeners for tracking objects’ lifecycle through the cache, and gives tons of metrics for effectiveness like # of hits each object gets. It even has a way for monitoring remote caches. But all of these features come with a configuration cost and headache.
- Because it uses jvm memory, objects don’t have to be serialized, which means its very fast. Overflow to secondary (disk) storage can be asynchronously batched so the hit is negligible.
Winner: Ehcache. Ehache easily wins this category. we aren’t talking about ease of use, we are talking about pure abilities. Its faster and more feature rich.
So its a tie. Awesome. Everyone loves a competition that ends in a tie. But that isn’t really the case. Depending on your application, there is a clear winner. If you need something that is easy to set up and use, will give you performance increase and scalability without hassle, and will probably wash your dishes, go with memcached. If performance is of utmost importance and you have complex caching needs that go beyond get/put, and especially if you are already using spring, and possibly hibernate’s 2nd level cache, you need to use Ehcache. However, to keep this post from totally copping out at the end, I will go so far as to say this: Someone with experience using ehcache will be able to overcome any slight disadvantage it has because of complexity and configuration. Its like riding a bicycle, once you can start and turn left, turning right isn’t hard, you just change things a little.
Therefore…..
OVERALL WINNER: Ehache. Hire smart, experienced devs, give them a decent timeline and let them use the better Java caching solution.
Note: RoR fanboys, please don’t cry. This post is talking about JAVA caching. Rails doesn’t have Ehcache. Use memcached and rails and get your website up in 28 minutes, that’s fine with me.
If you’re running on a single server, and on java, it sounds like EHCache is the way to go. I’d be willing to bet that in-process caches are always faster. I’m curious about EHCache’s horizontal scaling, though, because it would seem like then you’d have to deal with serialization, which I’d guess becomes a bottleneck.
As far as the link showing EHCache being significantly faster the Memcached, I think the comparison is flawed. It looks like EHCache was running in-process, while Memcached can *only* run in distributed mode.
Also, if the cache is forced to overflow to disk, that’s going to provide a big performance hit (granted, memcached just starts expiring entries when it fills up). Memcached is nice in that you can cluster it and take advantage of several instances so you don’t have to worry about kernel memory allocations (an issue on ghetto 32bit machines, probably like the one we host our stuff on).
I could be wrong, but I don’t think memcached uses HTTP, just TCP. If the instance is on the same machine as the server, then hitting the loopback device is short circuited in any kernel worth a damn, so network traffic isn’t really an issue (for that small case).
The configurability of EHCache is nice, and it does pack in well with Spring and Hibernate, but still depending on your setup, even if you are using Java, it might not be the right choice: http://tsmarsh.blogspot.com/2008/04/saga-of-ehcache.html
EHCache is good for small projects.
EhCache do not scale horizontally. It’s work fast only for 1G of data. Try cache over 2G …
EHCache + terracotta can easily scale horizontally and can cache way more than 2G performantly.
We use ehcache backed by memcached and it scales very, very, very well.