Let's get to the example of a simple counter for which the frequency of updates is too high for a single entity (such a counter could be used for counting the number of views for a YouTube video):
class SimpleCounterShard(models.Model):
"""Shards for the counter"""
count = models.IntegerField(default=0)
name = models.CharField(primary_key=True,
max_length=500)
NUM_SHARDS = 20
@classmethod
def get_count(cls):
"""
Retrieve the value for a given sharded counter.
"""
total = 0
for counter in SimpleCounterShard.objects.all():
total += counter.count
return total
@classmethod
@commit_locked
def increment(cls):
"""
Increment the value for a given sharded counter.
"""
index = random.randint(0,
SimpleCounterShard.NUM_SHARDS - 1)
shard_name = 'shard' + str(index)
counter = SimpleCounterShard.objects.get_or_create(
pk=shard_name)[0]
counter.count += 1
counter.save()
The basic idea is to divide the counter into N sub-counters and to compute the counter's real value (
get_count
) by summing over all values of the N sub-counters. While trying to increment the counter's real value (increment_count
) we select one of the N sub-counters at random. As a result we avoid contention. By increasing the number of shards you can increase the maximum throughput.There exists an App Engine specific article about sharding counters from Joe Gregorio. If you want to learn more about sharding counters take a look at that article.
I ported the code (including GeneralCounterShard) from that article to native Django such that it can theoretically be used for any database supporting optimistic transactions. The code is available here and you can test the live demo.
Django's advantage
Now we can get to an exciting example of how Django's ORM can get us out of the stone age of non-relational database development practice. Let's see an example using F() objects:YouTubeVideo.objects.filter(pk=keyname).update(The database backend can detect such "counting updates" and use sharding for App Engine or other techniques like updates via background tasks for other databases automatically in order to perform such an update. Thus Django gives us the advantage of an additional layer of abstraction. Moreover, we can formulate "counting updates" independently of the database used, so we can switch to a different database without having to change the code.
views_count=F('views_count')+1)
Keep in mind that sharding is a technique which can be used for much more than just counters. This is our first App Engine app ported to native Django. If you plan to port some other App Engine app to Django you can use the sharding counter port to get an idea of how to do it. And please let us know about it.
I wrote a version of the CounterShardConfig that support decrements correctly.
ReplyDeleteThe original Sharded counter article uses memcache.incr(name) but name can't be negative. The trick is to maintain a counter.offset property.
total += counter.count
total -= counter.offset
Thanks for your idea, this is a nice improvement.
ReplyDeleteYou are right, with the current port it's not possible to decrement the counter. But i just wanted to port Joe Gregorio's code :)
There exists a simpler method of decrementing such a counter without having to maintain two different properties. You can take a look at this here:
http://bitbucket.org/gumptioncom/gaegene/src/tip/counter/models.py
Bye,
Thomas