Archive for February, 2007

Are There Any Good Programmers?

February 28, 2007

Jeff Atwood is writing about the dirth of programming talent available out there.  Not “good,” or even “satisfactory,” but even programmers with any talent.  A couple of highlights:

  • Less than 1% of applicants for every programming job “…can’t write any code whatsoever.”
  • Large number of applicants with degrees in Computer Science cannot answer the simple request “write a loop that counts from 1 to 10.”
  • Most candidates can’t handle even the tiniest of programming problems.

I can’t say that I’m that surprised.  My inbox gets full of unqualified candidates pretty quickly when I’m hiring.  I’ve received a lot of bad resumes.  My favorite was from a hot dog street vendor that was applying for a Director level position in my technology practice. 

Whenever you post a position you will get an immediate, and usually overwhelming, response.   From my experience you can probably toss all resumes you get within the first 48 hours of a posting a job and save yourself the headache.   The vast majority of those resumes are being sent by job hunters that will apply to anything that moves.

Finding adquate talent is tough and finding great talent can be a heroic effort.  To find the best technical talent you need to always be in recruiting mode.  Bring in interns (and actually use them to do real work) and network as much as you can in the industry.  You are more likely to find your next star programmer, project manager or any other position from the connections you make than from postings on job boards.

The answer to the question of this post is “Yes, there are plenty of good programmers.”  But they don’t stay out on the market long.  Many don’t go on the market at all as they transition from one employer to another through their network.

Bad Agile Evangelists – How To Not Attract Flies Using Vinegar

February 26, 2007

I remember sitting in a room full of IT professionals listening to an acomplished agile developer start a presentation.  He authored his own book and worked for a well known consulting firm that were well known for following an agile methodology on their projects.  Bright fellow.  He was barely into his talk when he said ”I’m going to take your sacred cows and kill them in front of you, grind them up into hamburger and feed them to you.”

Hmmmm.  I remember thinking “That’s a bad way to start a presentation.”  Everyone was against him at that point.  Except me. 

I thought that was a rather militant way to open a presentation, but I wanted to hear some of what he was saying.  I was using a methodology very close to what he was explaining.  However, when I would chime in with comments on how my teams were applying the same techniques he would showcase me to the rest of the audience and was basically saying “why can’t you be more like Ben?”

Ugh.  I didn’t know if I wanted to appear to be taking sides with this guy.  Sure everyone has opportunities to improve the way they do things, but insulting them is not the right way to sway them.

Recently I viewed a talk on Agile Testing, got from a series of videos reviewed over at Rise Again, where Elisabeth Hendrickson recounted hearing Kent Beck, the Father of eXtreme Programming, say “‘QA people are a throwback to Tayloristic, scientific, time and motion management…and that [they] are all irrelevant in the brave new world of eXtreme Programming’…he said this to a standing room only audience of quality assurance professionals.”

Now I may be going out on limb here, but I’m guessing that neither Kent Beck or the consultant that shall remain nameless have read Dale Carnegie’s How to Win Friends and Influence People.

The one thing that you don’t want to do when you are trying to persuade someone to your point of view is to put them on the defensive at the very start.  These two very smart individuals threw away their audiences from the very beginning.  That’s not too bright.  From other anecdotes I’ve heard have similar stories.

I’ve read Kent’s book and it’s phenominal.  If you haven’t read it you should.  If you aren’t applying at least some of XP methodologies you need to.   These men and women have a lot of valuable techniques that can and should be used.  Even if it is in spite of the the fire-and-brimestone arguments made by some evangelists.

How to Avoid The 3 Common Mistakes Beginners Make When Architecting the Data Warehouse

February 23, 2007

Rick Sherman wrote about the three common mistakes beginners make when architecting their DW/BI platform.  Though he doesn’t state it explicitly, these mistakes are made by data architects that are used to designing transactional systems and not reporting systems.  My experience echos his.  It can take time, and lots of conversations, to help developers shift to a different paradigm.

These are the three mistakes along with my perspective on each.  

1. Letting enterprise applications inspire the architecture.

The applications used for running the business are meant on capturing the data on the business.  Not running reports.  Most applications think of reporting on the data after the fact.  The mindset is if we can get it in then we can get it out.   Problem is that it’s not an effective reporting layer.  Usability, efficiency and maintainability as a reporting environment suffers.  Also, massive reporting can wreck havoc on an OLTP systems as it locks up records and keeps tables hostage until the query is complete.

BI vendors help foster some of this attitude with their sales force.  During the sales process, one of their tactics is to take your existing data, pump it through their tool and show reports within a few days (with 24 hours being the goal).  They usually leave the source schemas as is and make the necessary transformations within the BI tool instead. 

It’s a powerful sales technique because the decision makers see their data in a way they never had before and it all looks soooo easy.  This is done with a small subset of data in a very controlled environment.  These demos do give the impression that you can use your existing data schemas as-is.  However, you don’t want to turn your BI tool into your ETL tool.

2. Engaging in DW schema wars.

Those that live and breath 3NF are taught that denormalizing data is bad practice.  It’s unheard of!  Heck, many have at least tried to get their models into either 4NF (even 5NF?) at some point in their career.  I have.   

Explaining why normalizing the reporting layer is bad can be a challenge.  I can’t tell you how many conversations I’ve had with data architects to get them on board with denormalizing the data for a data mart.  There have been times were I had convinced someone only to have them come back later to pick up the argument. 

People tend to take a stand on what they know because they don’t know what they don’t know.  It boils down to using the right tool, or model, for the job.  On the way to getting the data from the source systems to the BI tool, the data is going to have several landing spots along the way.  Most stages will require entity-relationship modelling while the destination used for reporting will require dimensional modelling (denormalizing the data).

3. Snubbing summary tables.

Here I don’t disagree with Rick.  However, I do believe it is a good idea to keep the grain of the data as low as possible without sacrificing performance and integrity.  It really depends on the data, how it will be used and what the requirements are.

But there is nothing wrong with summary tables.  All reporting at the BI layer is aggregating on some level.  It’s good practice to make that happen in the ETL process instead of on the fly if the details aren’t needed for drill throughs.  Better to have the aggregation done once instead of every time the report is run.

The best way to avoid these mistakes is to make sure you understand that there are different data models and architectures to support a fast, scalable and reliable DW/DM/BI environment than an OLTP one.

These Are Not The 10 Largest Databases in the World

February 22, 2007

I read the headline on the 10 Largest Databases in the World over at Business Intelligence Low Down (BILD) and got excited.  For me it would be a dream to wrangle with some of the hugest data sets on the planet.  So I was eager to read some of the stats of these ginormous tupleplexes.

Unfortunately, as I read the disappointment started to set in.  As I read more I started to feel a bit cheated.

Here’s how they started off the list:

#10 – The Library of Congress.  The vast majority of it isn’t even digitized.  If it was, the estimate in the post would place it around 20 terabytes.  20 terabytes? 

#9 – The CIA.  Now I’m sure these guys and gals have enormous amounts of data, but BILD doesn’t even take a guess at how big it would be.  How does this make the list if there’s no measurement?

#8 – Amazon.  When I read “the world’s biggest retail store” I had to stop and do a double take.  One word for you:  Walmart.  Compare Amazon as having 59 million customers with 80% of the U.S. population that shops at Walmart at least once a year. 

Walmart’s data center, dubbed Area 71,  had about a half petabyte of data in 2004. Now Walmart is highly secretive about their data so there is no updated information on its size that I could find.  However it’s safe to assume that as their revenue rises, so does the information they collect. 

Since 2004 Walmart’s revenue has increased by about 34%.  Projecting that growth onto their data we’re looking at about 174 terabytes in growth.  That would be on top ofall the information they continued to collect year after year on their existing customer base that represents the baseline revenue from 2004.  My guess is that they are knocking on the door of a petabyte right about now.

#7 – YouTube.  They are listed because they have about 45 terabytes of video.  The problem with that number is that all that data isn’t searchable.  That’s not a database, that’s storage.

OK, I’m totally frustrated at this point in the post.  So I decided to do a little searching and cracked open Google.  In addition to Walmart (which blows away items 3-10 on their list), here are some additional numbers I found off the cuff:

  • At a half petabyte in size, the the Stanford Linear Accelerator Center had the largest database in 2002 (5 years ago). 
  • The U.S. National Security Agency (NSA) collects information from AT&T, SBC, Verizon and BellSouth.  According to Wikipedia, the NSA Call Database has 1.9 trillion records (which looks like what is listed for AT&T in this Top 10 post which strikes me as odd).  Regardless, this has to be bigger than AT&T (BILD’s #3) just by using simple arithmetic.
  • The Winter Group puts out their Top 10 of the World’s Largest Databases for 2005.  Commercial databases are topping 100 terabytes and peak workload sees over 1,000,000,000 SQL statements an hour.

And this was just from cursory searching.  With just a little more effort I’d guess there are at least a dozen others that would bump 80% of this list out of contention.   

So from where I sit the folks over at BILD got it wrong – way wrong.  And in more ways than one. 

This is not the list of the 10 Largest Databases in the World.  

The Second Thing You Should Do When Starting a Data Warehouse Project

February 20, 2007

The first thing to do in a data warehouse project is to define why it’s being built in the first place.  This should be tied to the business strategy and have very specific critical success factors established.  There should be measurable (quantifiable) and immeasurable (qualifiable) business value that a data warehouse should provide.  Basically this sets the direction, goals and measurements for success of the project.

The very second thing that needs to be done is define the reports that will be produced by the warehouse/mart(s).  This will prove to be the litmus test for what you did in your first step and set the stage for everything else you do in the project.

After all, the sole purpose for creating a data warehouse is to enable the decision makers of an organization to make decisions based on how the company is performing.  If the leaders of an organization are not getting actionable information then there’s no point in going through all this effort. 

Establishing the business need is valuable, but they’re just words on the page.  You need to get report mock-ups in front of whoever will end up using them when they go on-line.  These mock-ups should good enough to be interpreted and evaluated visually.  There will be a few reports that already exist in Excel or from a legacy system.  Some will need to be created from scratch.

The process of defining these reports will have the following benefits:

  • Keeps the key stakeholders of the project actively engaged in the process.  They can wrap their arms around this.
  • Validates most of your success factors.  The results of the reports should tie directly to many of those defined in the first step.
  • Identifies what will functions and features will be needed in your reporting tool – static reports, drill through, customizations, method of delivery, format, etc.
  • Identifies which reports are likely to be more challenging than others and which to pay special attention to.  Enables you to start mobilizing your team to tackle potential setbacks early.
  • Most, if not all, of your dimensions and facts will be defined and will guide how they should be designed in the data warehouse/mart and to what level of detail (grain) they should go down to.
  • Gives a good idea on how the metadata/objects in the reporting tool will need to be structured.
  • Gives you leads as to where the data currently resides as a natural part of this process (bigger deal in larger organizations with many data silos).

So this process will give the direction and set the tone for the rest of the project. 

Now you are well on your way to uncovering the most overlooked benefit in data warehousing.

Get Your WordPress.com Blog Digged in 3 Easy Steps

February 19, 2007

UPDATE:  Since this original post, WordPress has made a better option available to getting folks to Digg your posts.  Now on to the original post…

For those of us who use WordPress.com to host our blogs vs. hosting them ourselves there are some tradeoffs.  One of those tradeoffs is that there are limits to what can be customized in your blog.  Including JavaScript in your posts is one of those tradeoffs – it’s not allowed.

That wasn’t such a big deal until I wanted to start including a Digg It! icon in my posts.  I saw it in many of the blogs I read, and thought it would be a great way to get my posts in front of more people.  The problem is that most of the Digg submission buttons required JavaScript.

All except one – the Custom Submission-Only Button.   My original thought was “Great!”  Then I realized that there is a lot of effort in creating the link that would be a pain if done manually.  Even more painful to do it more than once.  And there weren’t any tools out there to make it easier (couldn’t find any).

So I made one – the Digg Submission Link Creator.  It might not be the prettiest tool in the world, but what it does is allow you to do is to enter in the following:

  • Blog post permalink
  • Title of post
  • Brief description of post
  • Select a Digg topic
  • Select a Digg icon to serve as the link

Then just click on Create Digg Link button and out will pop the HTML in the Digg Link box. 

So to start including Digg It links in your posts, just follow these steps:

  1. Create and publish your post
  2. Create the HTML Digg link in the Digg Submission Link Creator tool
  3. Open your post again for editing, click on the “Code” tab and paste the link created in step #2 where you want the link to appear and press “Save”

That’s all there is to it.   And if you want something other than those three icons I have you can feel free to edit what’s being linked to whatever you want.


Follow

Get every new post delivered to your Inbox.