6 posts categorized "memcached"

11/24/2009

memcached and the client: Database UDFs

NorthScale's own Patrick Galbraith has, for many years now, authored and maintained the MySQL, and now Drizzle, UDFs for memcached.  Last week, Patrick took this one step further with the latest release, version 1.1, which now includes support for "check and set" (a.k.a. CAS) operations.  

User Defined Functions are available for a number of different databases.  This allows some kind of stored procedure language or other triggers to execute other code imported into the DB.  In the case of the memcached UDF, this means giving stored procedures the ability to call memcached operations.

The general idea here is pretty simple.  Most applications start with a database, though it's always possible to use web services or flat files.  Regardless of where the data is persisted, to keep the cache always up to date with the System of Record (SoR), one really, really simple approach is to propagate invalidations (i.e. deletes) to the cache whenever you update a record in the SoR.  Databases, either single or sharded, are so popular for managing app data, so they have a role in this pattern.  In the diagram below, when the application needs to update a record based upon user interaction (#1) the database can, if UDF enabled and told how to do so, invalidate that data in the cache (#2).

UDF Pattern Diagram
 

This isn't for everything since multiple operations may not be enforced as a transaction from the application, but it's simple to set up and works for a great many apps.

In addition to Patrick's excellent UDFs for MySQL and Drizzle, there is pgmemcache for PostgreSQL, and even a prototype of UDFs for Apache Derby (a.k.a. JavaDB).

Oh, and about that new CAS feature Patrick added to the MySQL/Drizzle UDF.  Most memcached users start with the small stuff: gets and sets.  They then find utility for operations like add.  Before long, they're wrastling with how to deal with distributed clients wanting to update an item.  At a high level, this is where "check and set" (a.k.a. CAS) operations come in.  Have a look at the original protocol.txt (or the binary proocol doc) to see how you may use this.  In particular, adding CAS allows one to implement lock-free algorithms frequently required when multiple systems want to update an item in a distributed system.

Jump in on the list off of memcached.org or on #memcached on IRC on freenode if you're looking for more information.

10/26/2009

building all the time

Recently, when Patrick Galbraith and I put together the next moxi release, we spent a bit of time getting the build clean on a number of platforms and with a number of compilers.  Continually building and testing on multiple platforms helps ensure the usefulness, quality and longevity of the code.
This is something all of us at NorthScale believe is good for the projects we lead and contribute to.  Dustin Sallings has long been doing this for memcached, as you can see from the memcached wiki and the build farm itself.  All of us at NorthScale have continued this effort joined by community contributions.  As you can see, it's quite a comprehensive list.  We do this other projects too.  For the memcached proxy, moxi, we have another build farm.
For those not familiar with continuous integration, buildbot allows us to shorten the time between new code and issues found in various build environments.  Every time a developer commits a change*, all of these platforms will try to build and test memcached.  If there is a problem, we can spot it right away and fix it, so build problems on other platforms don't linger.  There are many benefits: it keeps the code 'ready to release', some platforms may catch errors at compile time other platforms do not catch, etc.
To give an example of how this has worked for us in practice, if someone happens upon a platform with an issue, we of course address the issue itself (usually by asking them to file bugs and provide a test case), but we also ask for help adding to the build farm.  The Gentoo project, for instance, bundles memcached and covers many, many platforms.  An issue came up recently with Gentoo for the ARM architecture.  In the process, Dustin asked the contributor, Robin Johnson, if they could set up a builder, so we now have a builder for Gentoo on ARM.
For Patrick and I in this last moxi release, this meant adding some new builders to moxi's build farm, turning up the compiler warnings and fixing a number of bite-sized bugs.  While many of these bugs are more along the lines of warnings that only compilers in a pedantic mode would complain about, some others pointed to type safety which are those kinds of places in software bugs like to hide.  Some caulking and sealing will keep the bugs out.
In the end, NorthScale's goal here is to maintain high quality output for memcached, moxi, libmemcached and other projects we either lead or contribute to.  As Dustin's report card blog post shows, we've made progress against our goals already; buildbot's steady watch on our tree should help keep us there.

* Technically, the process we try to use is to use a "buildbot try" (also contributed to by Dustin) first when making changes, so we keep a tree which is buildable on all platforms all of the time.  It allows a developer who hasn't even committed a change to test it.  It just works with git to generate a diff, then patches the tree against some common history.

10/23/2009

Third Stage (and not the repeatedly-delayed album from Boston)

As a nerdy adolescent boy growing up in Baton Rouge, Louisiana I discovered that I really loved computers – and computer software in particular. Creating something from nothing and instantly seeing the results in all their black-and-green, 64 column-by-16 line, ASCII-character glory gave me a level of satisfaction very few other things did. While I was the last kid picked for sports teams in junior-high school PE class (behind the girls – and I’m not exaggerating), put a keyboard in front of me and I could do things few others could. I dreamed of one day living and working in Silicon Valley – helping to create the technology that brought much joy to my life.

Fast forward about twenty-five years, and I am living the dream. I can remember like it was yesterday the first time I drove down Interstate 280 in Cupertino and saw the Apple campus out the driver’s side window; then looped up highway 85 and down 101 to see Intel’s headquarters. I consider myself one of the luckiest guys alive. I get paid to do what I truly love doing – creating great software and building great software companies that allow others to do the same.

Over the course of my career, I’ve been fortunate to directly participate in two fundamental shifts in computing technology.

The first was the transition from mini- to micro-computing. Although I logged my fair share of VAX and Data General Eclipse time, I joined the fray in earnest as microcomputers were already entering the scene. I was less a creator, and more a participant in and beneficiary of that transition.

The shift from character-based to GUI-based user interfaces; the emergence of PC LANs and networked storage; the adoption of object-oriented design and programming languages; the emergence of client-server and subsequent shift to n-tier application architectures; and the proliferation of virtual machine software for Intel-architecture platforms were all what I would consider evolutionary steps in computing technology. Very cool stuff - enjoyed helping drive those transitions - but evolutionary.

The second fundamental shift was the emergence of the Internet, enabling global connectivity of computing devices through simple, open network protocols; and the establishment of the World Wide Web which rides on top. Having made the utterly ridiculous decisions to return to graduate business school and to pursue a sideshow career in investment banking just as all that good stuff was going down, I was once again more participant than instigator. Hindsight is 20-20, I suppose.

But a third major shift is happening; a shift that will mark the third stage of my career. And I’m not missing this one. Fortunately, I’ve been at the right place, at the right time, to help play a leadershAlbum-third-stageip role in driving this transition. The emergence of “cloud computing” will be bigger and more impactful on the computing landscape than all the previous transitions above combined. As a jaded, buzz-word overloaded, skeptical, long-term member of this community, I actually believe that assertion right down to my core.

In 2004 I started a company, with Xun Wilson Huang, called Akimbi Systems. Acquired in 2006 by VMware (where I remained for a couple years), the technology we built is now being used as the foundation of VMware’s cloud provisioning platform. Virtualization (server, storage and networking) is a key enabling technology making the drive to cloud computing possible, but there are other, key missing ingredients – mostly “up the stack.”

About a month ago I joined the team here at NorthScale to help build and bring to market some of those missing ingredients; and to help enterprise IT organizations understand and embrace the cloud computing model. I’ve been preparing my whole life for this opportunity and we are going to do it right.

We aren’t yet ready to fully detail our vision, strategy and products to the market, but we are doing some pretty amazing things here and we can’t wait to tell the world.

“but thats not what I came to tell you about.

Came to talk about the draft.”

The development of critical infrastructure software using the open source model is the future, and increasingly the present, of software development. We plan to embrace and actively support a number of open source projects in our work here at NorthScale, and memcached is one of them.

We believe deeply in the power of the open source software development model and we are going to do everything in our power to respect, support, contribute back to and enhance the vitality of the memcached community; and the same goes for any other project we participate in.

The memcached community deserves credit for creating and enhancing a software system that is currently used by thousands of web applications, including substantially all of the top 20 web applications (by traffic count) on this planet (including Facebook, MySpace, New York Times, Google, Yahoo!, LinkedIn, Craigslist, eBay, salesforce.com). Rather than try to co-opt or claim credit for the work of the community, our goal is to recognize that great work; and to continue to do our part to support the efforts of these people while helping to improve the software and contribute those improvements right back to the project. It is the right thing to do.

I can’t wait to share my experiences with you in the coming months and years and I’d love to hear from anyone that shares our passion for cloud computing and open source software development.

10/21/2009

More memcached

Our intrepid NorthScalers have been doing some interesting work recently in memcached land...

Last week, Dustin Sallings announced his memcached server implementation in Erlang, called EMemcached.  Besides being a cool project, there's a surprising amount of interest in the mixture of memcached and Erlang, as you can see from the comments on Dustin's announcement post.

Today, Patrick Galbraith announced that the memcached UDF's are now integrated into the Drizzle project's mainline.  Drizzle is an interesting fork of MySQL and these memcached UDF's (which were originally inspired by the memcached UDF's for MySQL) makes it easier than ever to work with memcached from the Drizzle RDBMS.

As a last note, we're heads down cranking on incredibly cool Scale Out Data infrastructure.  Stay tuned!

08/07/2009

MrRoboto: The memcached AMI story

Not so long ago, we went to lunch at a Sushi joint on Castro street in Mountain View across from NorthScale's modest HQ.  On the way over we were talking about Japanese robatayaki and Dave told a story about an Oscar speech for a Japanese animator who works for a company named Robot.  I have plenty of other stories about the HQ, like describing a "Pink" day... but that's not the purpose of this post....

It was over lunch that Dustin, Dave, Steve and I, right after the memcached 1.4.0 release, said we should just throw together an AMI with memcached bundled into it for people to use.  We didn't want it to be modified in any way from what you could get from memcached.org.... it'd just be built by some folks who work on and use memcached.  It'd be designed to simply boot and use as much available memory on the system to run a nice big memcached instance.  We know lots of developers and we know that some of them use EC2 instances to run memcached.  We would just make it simpler to obtain.

Then, I opened my mouth and said I thought I could throw together an even better AMI, which used some of the technology I'd worked with in the Sun Web Stack to simplify an EC2 deployment even further....  Thus began MrRoboto.

As I said above, we know lots of developers, so we know how they use memcached.  As people get started with memcached, they tend to run it from a terminal with "-vvv".  That level of verbosity is really only something a memcached developer could love.  It's a bit too much for your average developer using Rails, PHP or Java.

Speak up about an idea at a startup, and you've just given yourself a software development project to run!

The concept was we'd mash up a DTrace script, Dustin's slosh server modified to run said script when a user asks for it, and a simple AJAX browser UI getting info to show it off to a user.

Rather than try to describe it, have a look at this screencast:

memcached AMI with tools from NorthScale from Matt Ingenthron on Vimeo.

Effectively, this is "-vvv" in a browser.

It's really, really basic for a first release, but we can think of a few places to grow from here now that the base is in place.  While it's not immediately apparent from this simple, but useful "keyflow", we can do a lot of other interesting things.  Some other experiments are already working.

Other than Trond Norbye and myself, I don't think many people have poked at memcached with DTrace even though it's been in the code for over a year.  I won't go into huge detail on DTrace here, but the huge advantage you get for 'free' with DTrace is that it is simple and safe to use in production, and the overhead of a probe which is not enabled is nil.  That's exactly how we used it in this AMI.  If you aren't looking at it with a browser, there isn't any overhead.  Even if you are looking, the overhead is very low, and out of the critical path of memcached's execution.  In my humble opinion, the technology behind DTrace is awesome; it just needs some published examples and tools to help people ramp up.

Thanks much to my colleagues Dustin Sallings for the slosh example, Rod Ebrahimi for helping to get the UI into shape and Steve Yen and Dave Neilsen for the brainstorming and enthusiasm behind the project.  Also, thanks to Trond Norbye for authoring the DTrace probes, Chad Mynhier and Adam Leventhal for helping with my simple DTrace-fu issues and Rich McDougall/Jim Mauro and Brendan Gregg (initially of DTrace Toolkit fame and later of Fishworks analytics fame) for the tutilidge on DTrace over the years.

More to come in future posts on project MrRoboto, but if this is something you find cool or useful... or you have an idea for something you'd like to see in a future MrRoboto AMI, please let us know!

p.s.: the AMI is based on OpenSolaris.  Amazon's EC2 management console (currently in Beta) bugs are keeping it listed as "Other Linux", which is the default for anything they can't identify with a substring.

07/24/2009

probing memcached

For those of you who've peeked into the memcached source code, you've probably seen the 30 or so odd MEMCACHED_FOOBAR() statistics calls or probes strewn throughout the codebase.  These are actually DTrace probes, and Matt Ingenthron's put up a post describing some of the basics on to use them. 

http://blogs.ingenthron.org/matt