#.think.in
learn.create.enjoy

Get in the Queue, your locks don't work here

February 26, 2009 01:21 by tarn

I am, of course, talking about a pattern in high scale web application design, Queues. More specifically the Queue service on the Windows Azure cloud computing platform.

So when do you use Queues?

Its common to use thread synchronization objects in a web application to serialize access to a resource, but this thread synchronization approach doesn't work so well when you have multiple applications, over multiple servers competing for exclusive access to a global resource.

The first scenario I've found which suits the Queue service is in a user driven content ranking system. I suspect I'll find many more places to use the Queue services as I continue to learn more about developing and designing applications on the cloud platform.

In the web application I'm building I want users to be able to rate dynamic content and I also want to record the number of views. This is a very common concept which allows the users to provide feedback which can then be shared with other users and used to dynamically rank the content.

I can start by implementing some tables, one to store views and one to store votes. All I need to do is insert a row into the corresponding table for every view or vote. I can then query the tables to get all the views and votes for each content item. While this will work, it won't scale. Imagine querying a YouTube video view table! It's certainly not the sort of task you could do while the user is waiting for a page to render.      

You may be able to resolve this problem with a background process, using some thread locks, to periodically calculate and cache this information on each web server. While I may end up using some caching on each web server, storing all the pages and their counts on each web server may not be feasible. Another problem with this is all the web servers are doing the same data analysis, which doesn't seem a efficient use of resources.  

A nice solution for this type of problem is to use the Queue service and a table for the summarized data. Instead of adding page views and votes to the original tables, they can be added to a queue. A worker role can then be implemented to periodically check this queue, potentially doing some validation (or other work) on each item, and then update the reference table. This could be scaled even further, if required, by having multiple worker roles processing items from queue.

A not entirely desirable effect of using this pattern is that votes and pages views are not immediately registered when a user votes or visits a page. I guess this is more desirable than your users not being able to view pages when a single server web server reaches capacity or users getting page time-outs as you process billions of table rows while they wait.

Continuing a theme in my recent posts, I'm going to have a play with the service using IronPython.

I've been using this little helper module which just imports the StorageClient assembly and makes it easy to create development StorageAccountInfo objects as the have the development credentials as default values.

import clr
clr.AddReference("StorageClient.dll")
from Microsoft.Samples.ServiceHosting.StorageClient import *
from System import Uri

class Account():
   def __init__(self):
      self.endPointUri = Uri("http://127.0.0.1:10002/")
      self.accountName = 'devstoreaccount1'
      self.accountSharedKey = 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=='
   def GetStorageAccountInfo(self):
      return StorageAccountInfo(self.endPointUri, None, self.accountName, self.accountSharedKey)
      
class BlobAccount(Account):
   def __init__(self):
      Account.__init__(self)
      self.endPointUri = Uri("http://127.0.0.1:10000/")

class QueueAccount(Account):
   def __init__(self):
      Account.__init__(self)
      self.endPointUri = Uri("http://127.0.0.1:10001/")

 

It's then pretty easy to create a queue and add an item to it

from AzureQueueHelper import *

# create a connection to the development storage service
queueStorage = QueueStorage.Create(QueueAccount().GetStorageAccountInfo())

# create a queue by name 
messageQueue = queueStorage.GetQueue("test1")

# this will create the queue if it does't exist, but will do nothing if it already exists
messageQueue.CreateQueue()

# create and send a message
msg = Message("testString")
messageQueue.PutMessage(msg)

 

Then get it back in another process

from AzureQueueHelper import *

# create a connection to the development storage service
queueStorage = QueueStorage.Create(QueueAccount().GetStorageAccountInfo())

# create a queue by name 
messageQueue = queueStorage.GetQueue("test1")

# this will create the queue if it does't exist, but will do nothing if it already exists
messageQueue.CreateQueue()

# the is the time you have exclusive access to the queue item
msg = messageQueue.GetMessage(5)

if msg: 
   print msg.ContentAsString()
   # message must be explicitly removed from the queue
   messageQueue.DeleteMessage(msg)

 

The asynchronous methods can be used to create a simple little event based queue processing server. 

from AzureQueueHelper import *
from System.Threading import *

# auto reset event to keep it alive
are = AutoResetEvent(False)

# create a connection to the development storage service
queueStorage = QueueStorage.Create(QueueAccount().GetStorageAccountInfo())

# create a queue by name 
messageQueue = queueStorage.GetQueue("test1")

# this will create the queue if it does't exist, but will do nothing if it already exists
messageQueue.CreateQueue()

# handle message from the queue
def MessageReceieved(sender, e):
   print e.Message.ContentAsString()
   messageQueue.DeleteMessage(e.Message)

# setup polling, should keep reading untill the queue is empty, then poll for 5 seconds
messageQueue.MessageReceived += MessageReceieved
messageQueue.PollInterval = 10000;
messageQueue.StartReceiving();

# repeat until are is set (forever in this example)
are.WaitOne()

 

Its really amazing how little is required to extended this little server to the worker process discussed in this post using just the table services I've previously posted about.


Tags: , ,
Categories:
Comments (0)

#.think.in infoDose #18 (26th Jan - 20th Feb)

February 23, 2009 16:44 by brodie

Announcements

Architecture

 

ASP.NET/MVC/Javascript

WCF

Silverlight

Utilities

Books

1156-4126621df5388277

 

Singularity

Other


Tags:
Categories: Links
Comments (0)

Import AntiGravity and I'll see you on Cloud Azure

February 21, 2009 00:03 by tarn

On the topic of setting up tables on Windows Azure Steve Marx writes

Probably the best solution is to have separate initialization code that creates your tables.  This is analogous to the pattern of having CREATE TABLE commands scripted in T-SQL which you run once to set up the database.

In another post Mark Seemann extends on this and demonstrates creating tables with a PowerShell script and writes

You could obviously write a little utility that references StorageClient and your custom TableStorageDataServiceContext.

Another, in my opinion, better option for such a one-off script is a PowerShell script

I agree but I think you can actually take this a lot further. In my experience working on enterprise applications it is also common to use one-off SQL scripts to add, update, maintain and manage data. The problem with this is you often have a very good ORM available which has logic to protect the state of the data and the business requirements. I feel that if you can also expose these business objects to a scripting environment, you potentially have a very powerful way of managing your enterprise information. This is probably even more relevant in the cloud data services where there is no equivalent of low level T-SQL you can optimize, you can have full control from scripting environment with the tools and armour of your enterprise objects.        

Anyway, enough rambling. In my previous post I demonstrated working with Azure table storage data in IronPython, and briefly glossed over creating the tables. In this post I'm going to walk through dynamically creating table in the cloud with IronPython and what currently needs to be done to get them working on the local development storage server. 

I'm going to create some of my own customs tables and some tables for the standard providers (Membership, Roles and Session) as I am building a simple MVC application for the cloud. I've got DataModels.dll with my custom models and the AspProviders.dll from the Azure SDK Samples with the standard provider models. All the models extend TableStorageEntity.

Both assemblies also extend TableStorageDataServiceContext classes which are similar to the DataContext class in Linq2Sql. They have IQueryable<T> fields templated to model classes, which represent tables. The base class has additional functionality for tracking, adding, deleting and saving model objects. We will be able to create tables by reflecting these TableStorageDataServiceContext classes.

 

Creating a database the local development storage server

Unfortunately a current restriction of the local development storage server is that you can't dynamically create tables. To create tables on the local development server the DevTableGen.exe tool is used.

DevTableGen /database:TarnsDevDB /forceCreate AspProviders.dll;DataModels.dll

Which should produce an output like below, indicating that the tables had been created. You can see the provider model tables and my custom tables have been created.

Windows(R) Azure(TM) Development Table database generation tool version 1.0.0.0
for Microsoft(R) .NET Framework 3.5
Copyright (c) Microsoft Corporation. All rights reserved.

DevTableGen : Generating database 'TarnsDevDB'
DevTableGen : Generating table 'Membership' for type 'Microsoft.Samples.ServiceHosting.AspProviders.MembershipRow'
DevTableGen : Generating table 'Roles' for type 'Microsoft.Samples.ServiceHosting.AspProviders.RoleRow'
DevTableGen : Generating table 'Sessions' for type 'Microsoft.Samples.ServiceHosting.AspProviders.SessionRow'
DevTableGen : Generating table 'PostTable' for type 'DataModels.PostDataModel'
DevTableGen : Generating table 'CommentTable' for type 'DataModels.PostDataModel 

UI for development storage server should be found in the task bar. As it only supports one concurrent database, you may have to select the database just created.

image

Dynamically managing a database in the cloud

I created a simple IronPython helper module to assist creating and managing the table schema in the cloud. Like in the previous post, it's built on the StorageClient library that ships with the Windows Azure SDK. You'll notice that its expecting the StorageClient assembly to be in one of its script paths.

There was some frustration writing it; All the TableStorageDataServiceContext classes are internal, this meant I had to write the LoadFromAssembly method and do some reflection to get the types.

import clr
clr.AddReference("StorageClient.dll")
from Microsoft.Samples.ServiceHosting.StorageClient import *
from System.Reflection import Assembly
from System import Uri

class Account():
   def __init__(self):
      self.endPointUri = Uri("http://127.0.0.1:10002/")
      self.accountName = 'devstoreaccount1'
      self.accountSharedKey = 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=='

class TableHelper():
   def __init__(self, account):
      self.account = account
      self.storageInfo = StorageAccountInfo(account.endPointUri, None, account.accountName, account.accountSharedKey)
      self.tableStorage = TableStorage.Create(self.storageInfo)
   def AddTables(self, dataServiceContext):
      TableStorage.CreateTablesFromModel(dataServiceContext, self.storageInfo)
   def DeleteAllTables(self):
      for t in self.tableStorage.ListTables():
         self.tableStorage.DeleteTable(t)
   def PrintTables(self):
      for t in self.tableStorage.ListTables():         
         print t
   def LoadFromAssembly(self,assemblyName):
      # A bit of screwing round with reflection, because the types we want are internal
      assembly = Assembly.LoadFrom(assemblyName)
      baseType = TableStorageDataServiceContext
      contexts = filter(lambda t : t.BaseType == baseType, assembly.GetTypes())
      for context in contexts:
         print context
         self.AddTables(context)

Using this module we can easily create and manage tables from the IronPython interactive console, or with a script. Below is an example (with the credentials removed)

# load the helper module
from AzureTableHelper import *

# create an account and set creadentials
account = Account()
account.endPointUri = Uri('[Table Storage Url]')
account.accountSharedKey = '[Shared Key]'
account.accountName = '[Account Name]'

# create the helper with the account details
tableHelper = TableHelper(account)

# delete tables
tableHelper.DeleteAllTables()

# load tables from the AspProviders assembly 
tableHelper.LoadFromAssembly("AspProviders.dll")

# load tables from the DataModels assembly
tableHelper.LoadFromAssembly("DataModels.dll")

# print all the tables to the console
tableHelper.PrintTables()

There is a limitation of scripting tables this way as Mark Seemann notes,

Currently, the script has one limitation: Deleting a table using the StorageClient API only marks the table for deletion, so the operation returns much to soon. This means that if you are trying to recreate a table by the same name, a conflict will occur, and the table will not be created. You can work around this limitation by waiting a little while and then run the script again.

I think this part of the Azure framework is pretty exciting and I'm looking forward to checking out other parts.


Azure Table Storage in IronPython

February 20, 2009 14:14 by tarn

My goal was to write a bit of scaffolding to make using the storage service fun and easy from in IronPython. The Windows Azure SDK comes with a library in the samples which does the low level work interfacing with the API and provides some nice classes to work with. There are methods to generate table schemas by reflecting on model classes and another sample implements all the standard .NET providers.

I was hoping to write all the scaffolding and the model classes in IronPython but, in the first of a series of set backs, I found the development storage server behaves differently than the cloud. For some reason you need to create tables on the development storage server using a command line tool, passing your model assemblies as arguments. Apparently this will be fixed soon, but trying to stay focused I decided I'd have to write my models in C# for now.

Once I had an assembly with some models I could use the DevTableGen.exe command line tool that comes with the Azure SDK to create tables on my development storage server.

I don't think there are currently any good tools for visualizing and editing data, but I'm sure by the time its released it will integrate into Server Explorer and Query Analyzer (or perhaps it'll just be a Firefox plug-in). I've seen a presentation where HTTP requests are hand coded to retrieve data, but I wasn't up for that and tried importing the sample library into an IronPython console. I got enough working to convince me everything was going to work out, I just needed find out how to use extension methods in IronPython...

Much to my surprise and disappointment, consuming extension methods in IronPython is still difficult. There does appear to be a way to bind the extension methods, but I haven't seen an example binding all the Linq extension methods. This is a problem as it means Linq expression trees can't be easily built in IronPython. Passing expression trees as queries is ideal as the cloud can do the filtering, sorting and only return the data you want.   

I decided to put this problem on ice and write helpers in C# that returned a List<T> of all the rows. This won't work well with lots of data, but it will work find on small tables.

I wrapped the required credentials to connect to the service in an Account class, mainly so I could hard code the development credentials which are always the same.

public class Account
{
    public Uri EndPoint { get; set; }
    public String AccountName { get; set; }
    public String SharedKey { get; set; }

    public Account()
    {
        // development server defaults
        EndPoint = new Uri("http://127.0.0.1:10002/");
        AccountName = "devstoreaccount1";
        SharedKey = "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==";
    }
}

I later I will try creating Models on the DLR, but for now I created them in a separate assembly using C#.

namespace DataModels
{
    public class PostDataModel : TableStorageEntity
    {
        public PostDataModel(string partitionKey, string rowKey)
            : base(partitionKey, rowKey)
        {
        }

        public PostDataModel()
            : base()
        {
            PartitionKey = Guid.NewGuid().ToString();
            RowKey = String.Empty;
        }


        public string Name
        {
            get;
            set;
        }

        public string Content
        {
            get;
            set;
        }

    }
}

A context is required that derives from TableStorageDataServiceContext, the field names in the context are used for table names and the type of the field describes the table row. The DataServiceContext works a bit like the LinqToSql data context keeping track of all the objects it returns.

namespace DataModels
{
    public class DataServiceContext : TableStorageDataServiceContext
    {
        public DataServiceContext(StorageAccountInfo accountInfo)
            : base(accountInfo)
        {
        }
        
        public IQueryable<PostDataModel> PostTable
        {
            get
            {
                return this.CreateQuery<PostDataModel>("PostTable");
            }
        }

        public IQueryable<PostDataModel> CommentTable
        {
            get
            {
                return this.CreateQuery<PostDataModel>("CommentTable");
            }
        }
    }
}

I wrote a generic model class to hide all the details and provide a simple wrapper to access create, read and select operations on a table. There's a couple of tests in the solution that demonstrate how this work, but basically you just need a custom context (V), a model (T) and table name. The generic model can then be used to insert, select and delete from the table.   

public class Model<T, V>
    where V : TableStorageDataServiceContext
{
    private V _context;
    public StorageAccountInfo AccountInfo { get; set; }
    public TableStorage TableStorage { get; set; }
    public string TableName { get; set; }

    public Model(Account account, string tableName)
    {
        TableName = tableName;
         AccountInfo = new StorageAccountInfo(account.EndPoint, null, account.AccountName, account.SharedKey);
         TableStorage = TableStorage.Create(AccountInfo);
         _context = Activator.CreateInstance(typeof(V), new object[] { AccountInfo } ) as V;  
    }

    public List<T> Select()
    {
        MethodInfo field = _context.GetType().GetMethods().Where(f => f.Name == "get_" + TableName).FirstOrDefault();
        if (field == null) return null; // field doesn't exist
        IQueryable<T> fieldValue = (field.Invoke(_context, null) as DataServiceQuery<T>);
        var results = from c in fieldValue select c;
        TableStorageDataServiceQuery<T> query = new TableStorageDataServiceQuery<T>(results as DataServiceQuery<T>);
        IEnumerable<T> queryResults = query.ExecuteAllWithRetries();
        return queryResults.ToList();
    }

    public void Insert(object item)
    {
        _context.AddObject(TableName, item);
        _context.SaveChanges();
    }

    public void Delete(object item)
    {
        // AttachTo is not required if the item was created by the context
        _context.DeleteObject(item);
        _context.SaveChanges();
    }
}

We've done all the C# code, now lets see how it can all be tied together in an IronPython console.

IronPython 2.0 (2.0.0.0) on .NET 2.0.50727.3053
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> # import everything we need
>>>
>>> import clr
>>> clr.AddReference("DataModels.dll")
>>> clr.AddReference("DataHelper.dll")
>>> clr.AddReference("StorageClient.dll")
>>> from DataModels import *
>>> from DataHelper import *
>>> from Microsoft.Samples.ServiceHosting.StorageClient import *
>>>
>>> # create a generic model, using the default account (development server)
>>>
>>> model = Model[PostDataModel, DataServiceContext](Account(), "PostTable")
>>>
>>> # now we can add some data rows
>>>
>>> for i in range(5):
...    post = PostDataModel()
...    post.Name = "Post Name " + str(i)
...    post.Content = "Some content for post" + str(i)
...    model.Insert(post)
...
>>>
>>> # we can now read them back
>>>
>>> for post in model.Select():
...    print "Name", post.Name
...
Name Post Name 3
Name Post Name 1
Name Post Name 2
Name Post Name 0
Name Post Name 4
>>>
>>> # delete them all
>>>
>>> for post in model.Select():
...    model.Delete(post)
...
>>>
>>> # and finally ensure they have all been removed
>>>
>>> for post in model.Select():
...    print "Name", post.Name
...
>>>
>>>

 

I'm pretty excited with how it all worked out, despite the setbacks. In future posts hopefully I'll have a go creating tables in the cloud with models and context created on the DLR. I will also try and resolve using Linq Extension Methods in IronPython which are essential to building complex queries to be executed in the cloud. I'm also writing a very simple Azure MVC weblog app which I'll hopefully finish soon.  

The project can be downloaded here and I've also uploaded just the compiled DataHelper assembly.

 

I found these links useful writing this post:

Windows Azure Essential Links

Walkthrough: Simple Table Storage

Creating Azure Tables From Script