how Hadoop can disrupt the database industry

Slide1Hardly any book has attracted more attention among software companies than Clayton Christensen’s “The Innovator’s Dilemma” and some companies even welcome it as a sign of “innovative spirit” when engineers slap a product manager with it over the head, figuratively that is, whenever he or she presents an idea to increase business values along the lines of the existing corporate strategy.

In a nutshell, The Dilemma is the observation that established industries are more likely to invest in existing, proven but aging technologies rather than look for new, well, disruptive but initially risky or economically even outright unattractive innovations. The incumbents will miss the boat and their business model gets disrupted by a competitor who took the plunge. Eventually, the disrupter puts the incumbents out of business.

For a technology to disrupt an existing industry a few things must happen:

  1. The existing products have outgrown the actual market expectations, e.g., deliver more features than appear useful.
  2. The new technology changes one fundamental parameter in the equation, often at the cost of some other fundamental property like performance,  Continue reading
Posted in Architecture | 2 Comments

smdb’13 — the votes are in

The reviewing period for this year’s IEEE Workshop on Self-Managing Databases (SMDB) is over and the top papers have been determined! I’m very excited we were able to compile a strong program. As the following Wordle constructed from the abstracts tells you—well, actually, I let you come up with your own interpretation:

Slide1

Big shout out to the members of the PC who’ve done an outstanding job reviewing a pile of papers in record time! Thank you!

And without further ado, here’s the list: Continue reading

Posted in Uncategorized | Leave a comment

testing the accuracy of query optimizers

 A while back I had a very interesting conversation with Jack an application developer for a larger software company in the area; so to speak a person on the other side of the database.

He maintains that database application programmers who have to support a number of database systems have long developed a kind of taxonomy of query optimizers: they know which systems have ‘good’ and which have ‘bad’ optimizers. They know on which systems queries need “tuning” and even to what extent. Apparently, there is at least one system that ‘gets it almost always right’ as Jack would put it, and lots that are way off by default. When asked, however, how he’d quantify these differences Jack simply shrugged: ‘It’s something you develop over time; can’t give you a number.’

Slide1

We talked about measuring the quality of an optimizer in this post earlier on and I’ve put forward a number of desiderata for an optimizer benchmark then.

Continue reading

Posted in Foundation, Optimizer Technology | 1 Comment

big data vs. pundits – 1 : 0

So, after all it wasn’t such a ‘razor tight’ presidential race last night. Not that the Big Data camp expected one in the first place. Nate Silver, the poster child of predictive analytics in the political arena made a pretty convincing case for why Obama was very likely to win; he actually quantified it at around 90% probability. Just for kicks, compare this to Kimberley Strassel at the WSJ who in all earnest made the case for Romney just one day before the election.

In short, independent of the political outcome of this election it’s a yet another great showcase where Big Data wins hands down over punditry!

P.S. if you don’t have a WSJ.com subscription you can also get a copy of Strassel’s—well, what am I going to call it? Article?

Posted in Uncategorized | 2 Comments

socc’12

soccLogo

As expected, being a stand-alone conference made it difficult to attract what I call “casual attendees”. Folks who would be happy to spend a day or two after a big conference but won’t take the time out of their busy schedules for inter-continental travels just to attend this symposium. Consequently, most attendees were either authors themselves or locals happy to spend an extra hour in the morning rush hour to get to San Jose.
Anyways, the conference was well worth the time — even though I’d liked to have seen more database/data management related talks.

One of the highlights was definitely Scott Schenker’s keynote on Software-defined Networks (SDN). Making networking sound interesting and appealing to a larger crowd is a feat. Luckily, Schenker’s immensely talented and able to break down a complex technical subject to a fairly diverse crowd that, let’s face it, is just not that much into networking in the first place. Continue reading

Posted in Uncategorized | Leave a comment

smdb’13 — spread the word

Slide1ICDE’s Workshop for Self-Managing Database Systems (SMDB) has long been the venue for all things concerning self-management of databases, however, it’s only with the advent of Big Data and the increasing popularity of hard-to-manage data sizes, data formats, and data fire hoses that self-management finally takes center stage! 

We’re excited to organize this gathering and calling on all researchers and practitioners to consider submitting novel ideas, war stories, vision and experience papers. In the spirit of a true workshop, we’re looking for work-in-progress and kept the format of submissions to 6 pages.

Submission deadlines are as follows:
11/19/2012 — abstract submission
11/26/2012 — full paper due

Help us spread the word and let your coworkers or students know. Feel free to print the flyer and put it on your office door or your message!

For details visit the workshop website at http://smdb2013.cs.pitt.edu

Posted in Uncategorized | Leave a comment

how to build a query optimizer for big data

It’s time to talk about what we’ve been up to!

orca_frame

In a series of articles, I’ll describe the motivation and background as well as the engineering tools and practices we developed over the past couple of years to attack one of these once-in-a-lifetime projects that get engineers truly excited: building a query optimizer from scratch. All database vendors at some point in time have to redesign one or the other large component in their system. When it comes to the query optimizer, all of them have refurbished/rewritten/remodeled over the past 15 years. The first really big splash in this category was made by Microsoft with its rewrite of the entire query processor for the 7.0 release of SQL Server in the years of 1994-1998. This initiative was instrumental in taking the product from negligible revenue to being a 1 billion dollar a year business in only 2 major releases. Others followed suit, but as far as I can tell none was similarly radical — most were more a matter of refurbishing existing structures. If you’ve been part of any such initiative at, say Oracle, I’d really like to buy you coffee and get some insights in the software engineering aspects of your project: pitfalls, ambitions, team dynamics, etc.

Anyways, for a startup like Greenplum it’s a much dicier decision to rebuild and entire component and, suffice it to say a lot of convincing was needed before upper management gave the green light to go ahead and hire a team of engineers, design a new optimizer, and start coding. Now that the product is shaping up and we’re on the home stretch it’s time to review some of the lessons learned! What’s with the whale you ask? You’ll see.

So, stay tuned for a series on posts chronicling an exciting journey!

Posted in Optimizer Technology | Leave a comment

in the rearview mirror: dbtest 2012

Organizing DBTest together with my partner in crime Eric Lo from the Polytechnic University in Hong Kong was a great experience. For a long time already we both have been very passionate about developing test methodologies for database systems in all forms and shapes; hence, it seemed very fitting to volunteer for organizing the database test workshop!

We managed to solicit a total of 26 submissions, which is an all time high, as far as I can tell. While impressive by itself, it meant we just didn’t have enough people on the PC to keep the review load as low as we had originally promised. Luckily our PC members proved to be great sports and agreed to review pretty much double the number of papers we had originally anticipated. Thank you very much! As a result we managed to put together a very strong program! After some deliberation we decided to include a whopping 12 papers in the program and rather shorten the presentation time than reject several truly outstanding papers.

Continue reading

Posted in Uncategorized | Leave a comment

impressions from SIGMOD’12

This year’s SIGMOD conference was a good excuse to visit Phoenix, Arizona. Turned out, the choice of Scottsdale as a venue was a pretty good one: I prefer conferences/workshops to be held in places without (m)any tourist attractions or distraction in walking distance as it keeps the crowd together–there’s simply nowhere people could walk off to. A punishing 100+F outside temperature posed an additional incentive to stay within the confines of the conference hotel.

Continue reading

Posted in Uncategorized | Leave a comment

dbtest’12: call for participation

In a few weeks, on May 21, 2012, this year’s DBTest workshop will take place colocated with SIGMOD in Scottsdale, Arizona. DBTest is a great forum for practitioners and academics alike to exchange ideas and experiences regarding quality aspects of data management systems.

Slide1

The number of submissions was distinctly larger than in previous years indicating a pent up demand for solutions around reliability and testability of database systems. We ended up accepting 12 papers — and had to turn down a number of strong and interesting submissions. Check out the program and book your tickets!

Looking forward to seeing you in Scottsdale!

Posted in Uncategorized | Leave a comment