In a few weeks, on May 21, 2012, this year’s DBTest workshop will take place in conjunction with ACM SIGMOD in Scottsdale, Arizona. DBTest is a great forum for practitioners and academics alike to exchange ideas and experiences regarding quality aspects of data management systems.

The number of submissions was distinctly larger than in previous years indicating a pent up demand for solutions around reliability and testability of database systems. We ended up accepting 12 papers — and had to turn down a number of strong and interesting submissions. Check out the program and book your tickets!

Looking forward to seeing you in Scottsdale!

Check out the latest publication from Greenplum Engineering at this year’s SMDB — ICDE’s workshop on Self-managing Database Systems. One of the authors was an intern during last summer and we asked him to do an investigation into how data is redistributed during query processing in an MPP environment. In particular, are there patterns that can be discerned and used to layout the data more systematically than the usual somewhat hunch-based approach of looking for “frequent join attributes”? Sure enough the answer is yes, and this paper describes how it can be done.

Find an electronic copy here.

got phd?

Posted: January 2, 2012 in Uncategorized

After a long day of interviews, I wrapped up with a candidate the other day. Among other things, I tend to ask our guests what they’ve learned about the team, the position, and the company. Turns out, this particular candidate kept a close tab on her interviewers and noticed that all of them had Ph.D. degrees in databases or a closely related field. “So, what’s up with that?” she asked, “Is this a requirement for the job?”

Well, fact is, about 30% of our core engine development team are Ph.D.’s – with even higher numbers in some teams. Having said this, it might sound a bit unbelievable but we actually do not care much about degrees. So, how come we have such a high density of Ph.D.’s?

Well, it’s easy to confuse cause and symptom. Remember, I’ve written in the past about what kind of people I’m looking to hire: smart people who get stuff done. And although I value experience in engineers, it ranks lower than raw smarts and attitude – I know that’s quite a contrast to most of our competitors: the more established a company becomes the more it slows down and focuses on tried-and-true rather than on can-do. Read the rest of this entry »

prototypical development

Posted: November 28, 2011 in Uncategorized

Ever wondered what the Golden Gate Bridge would look like if software engineers had been tasked to build it?

Chances are, after the first brainstorming, one engineer starts building a prototype — maybe over the weekend. One lane, using limestone, his favorite building material (after all ‘limestone has been known to be the best building material for centuries’), nice little arches, maybe not exactly at the right position but good enough to get a first impression of what a bridge in this location could look like. Quickly, a few other excited engineers help out. They change the material mid-way to granite (after all “granite is known to be the best building material for bridges for centuries”) and add a bicycle lane. A massive tower is added as look-out for tourists sure to visit the bridge every year — a key requirement, everybody agrees.

At the next brainstorming, the prototype is unveiled and by unanimous vote the brainstorming meetings are henceforth turned into status meetings. Program Management is delighted to see early results and upper management praises the success of agile development and rapid prototyping: “It would have taken our competitors months even just to select the site!” More resources are poured into the project and the one-car-one-bicycle-lane project is quickly advanced. To make up for the unfortunate choice of location, a sharp turn is inserted half-way–”hey, it worked for the Bay Bridge“. Adding another 5 lanes for traffic is postponed for a future release together with widening the arches for ship traffic. Several segments need rebuilding in different material even before the middle of the strait is reached as granite turns out to be great material for building castles but not so much for bridges. Read the rest of this entry »

DBA’s the world over dread the day when their boss walks into their office and announces that it is time to expand the Enterprise Data Warehouse, the company’s crown jewel. While not a pleasant operation on single-node databases it means major surgery in conventional MPP databases that deploy a large array of shared-nothing compute and storage nodes. The standard approach is to dump all the data, provision new capacity, and then reload all the data. What sounds rather simple is actually an impressive logistic feat — if all goes well. Weeks, if not months, in advance, elaborate project plans are developed that span a number of teams across the company: from the hardware department all the way to the business customers of the database; you need all hands on deck. The process itself requires up to several days of downtime for databases in the Petabyte range — that is, if all goes well. In the event, that one or more things go wrong the crew will be scrambling to get either back to the original configuration or to a makeshift solution before the scheduled window of downtime expires and the business suffers from the outage.
The sheer prospect of difficulty of this operation makes many IT organizations put off an expansion as long as possible. Which usually makes things even harder as the system will be close to capacity when it finally needs to be expanded and spare components or capacity will be harder to come by in the heat of the battle.

With all that in mind, we developed a mechanism that allows expanding a Greenplum Database instance (1) without downtime, (2) no significant performance impact during the operation, and is (3) transactionally consistent on top of it. Read the rest of this entry »

dbtest 2011

Posted: July 6, 2011 in Uncategorized

A few weeks ago, the latest edition of the DBTest workshop took place. As in the years before, the workshop was held in connection with SIGMOD, meant to draw a good crowd of practitioners. And draw it did: the gathering was well attended throughout the day by both academics and folks from industry! I would guess that DBTest evolved into the workshop with the largest crowd from industry in all of SIGMOD?

This year’s keynote presentation, by Glenn Paulley of Sybase, was centered around the question of ‘how much more complexity can database systems deal with?’ Quite a interesting outlook and effectively a call to arms to simplify and restructure database architecture as a whole. Some interesting stats: Microsoft was one of the main contributors, as was the case in previous years, and quite a number of papers were authored by attendees of last year’s Dagstuhl workshop on robust query processing.

So, we’ve had some seriously successful workshop. Now, where to go from here? It seems, the organizers of the next edition(s) will face a couple of interesting challenges: Read the rest of this entry »

We’ve seen quite a number of papers co-authored by folks from our R&D organization. The latest addition is by Mohamed Soliman at this year’s SIGMOD conference.

Find the paper here.