Archive for the ‘Data Warehousing’ Category

How To Be An Expert In One Easy Step

April 6, 2007

In the old days the title of this post would be RTFM.

The Help menu is there for a reason and Jeff Smith writes a great piece that explains everyone can be an “expert” if they just used the Help feature built into the applications, systems and tools they use.  The information is there and, with often very little effort, the information is easy to find.

These are my experiences on why people rather go to the “experts” rather than building their knowledge themselves.

  • Intimidated by technology
  • Don’t have time to learn something new
  • Don’t have the capacity to learn something new
  • Easier to ask someone else to figure it out
  • Faster to ask someone who already knows or who can find out quickly
  • Delegation
  • Generational differences
  • Personality types - some people rather deal with people than with computers 
  • Hierarchical in nature – not my job 
  • Laziness

Albert Einstein was quoted as saying “I never memorize anything I can look-up.”  The key is being able to find it when you need it so you can apply it when you need it.  It’s a critical skill for any career and even more so for those in technology.    It’s one of the top skills I look for when I’m interviewing.

Top 10 Trends in Data Management

April 4, 2007

Looking back on what were popular stories in 2006 over at SearchDataManagement.com is a good indication for we are heading.  These are their their top 10 standouts and trends:

10. Compliance attempts automation.Moving away from Excel in the organization.  It’s amazing how much critical corporate information is trapped in the XLS jungle.  This is a very good thing.

9. Open source business intelligence (BI) invites interest. Ends up being just as expensive, but it’s just a matter of time until open source BI becomes a solid player.

8. Customer data integration reiterates its role.CRM, BI, DW, ODS and MDM do not take the place of CDI.  How many acronyms can YOU pack in one sentence?  No wonder it’s tough getting budget dollars for all these project.

7. BI and corporate performance management (CPM) continue to converge.  BI won’t be used by the whole organization, but CPM will.

6. Enterprise search finds a foothold. Coined “biggle” for BI and Google.  Search technology is the ETL for text analytics and BI vendors are folding it into their products and lines.

5. Data governance is back (and bad) - Managing data is not an easy job and 90% of data governance projects will fail on their first attempt.  Reminds me of these lines from Ghostbusters:

Dr. Egon Spengler: There’s something very important I forgot to tell you.
Dr. Peter Venkman: What?
Dr. Egon Spengler: Don’t cross the streams.
Dr. Peter Venkman: Why?
Dr. Egon Spengler: It would be bad.
Dr. Peter Venkman: I’m fuzzy on the whole good/bad thing. What do you mean, “bad”?
Dr. Egon Spengler: Try to imagine all life as you know it stopping instantaneously and every molecule in your body exploding at the speed of light.
Dr Ray Stantz: Total protonic reversal.
Dr. Peter Venkman: Right. That’s bad. Okay. All right. Important safety tip. Thanks, Egon.

…OK, that might only be funny to me.

4. Data integration and ETL evolve. SOA is changing everything.

3. IBM, Microsoft make moves.  IBM gets FileNet and Microsoft is making waves with it’s upcoming PerformancePoint Server 2007 for the mid market.

2. Data quality vendors are assimilated.  It’s very good that data quality is becoming part of the machinery instead of an add-on.

1. MDM attracts ample attention. It’s got mine.

What Everyone Should Know About Master Data Management

April 2, 2007

Found a treasure trove of easy to understand information on what a MDM hub is, how you create it and manage it over at Roger Wolter’s blog.  With information continuing to grow in the organization managing all that meta data is becoming more critical.

Different approaches to creating the MDM store is in MDM Master Data Management Hub Architecture

Great detail about versioning  and hierarchies in MDM Master Data Management Hub Architecture – Versions and Hierarchies.  If you are familiar with slowly changing dimensions in data warehousing the versioning will look familiar.  It’s also very clear how a MDM hub would help drive the versioning of your DW.

Excellent piece with examples on not only the initial update but how to get your arms around what’s needed to synchronize the MDM hub in MDM Master Data Management Hub Architecture – Population and Synchronization.

The tools and management needs to operationalize the MDM in MDM Master Data Management Hub Architecture – Maintenance and operation.

Finally a lot more informatoin on what MDM is all about is covered in the last post MDM Master Data Management Hub Architecture – Reading Material.  While looking at all his Microsoft links might make you think this is a lot of CRUD (humor might be lost if not familiar with the C.R.U.D. acronym) the information is non-vendor specific for the most part.

If you don’t know much about meta data management, by the end of this series you will. 

Trench Report – SSIS and SQL Server 2005

April 1, 2007

Dan Linstedt has some interesting tidbits from the field in using SQL Server 2005 and SSIS posted.  He has a wish-list for SSIS and SQL Server 2005.  Couldn’t agree more for his request for SS, but I wouldn’t vote for Visio functionality as the data modeling interface.  I’ve used Visio for modeling more than a couple of times and it can be painful and a little rudimentary.  It’s better than what’s built into the server tools, but I would like to see something easier to use and better features.

He talks a bit earlier in the post about the pains of table scans and poor query performance.  I can’t really comment on what he’s seeing because I’m not sure what he is doing or what the schema/configuration looks like, but what I can say are that if you are seeing a table scan in your execution plan then there is no optimization going on.   He’s definitely filtering the data down a bit, so there are a couple of things that he probably could do:

  • Clustered indexes – lay down the data on disk on your most common filtering criteria.  Limits the zigzagging on disk that can eat read performance away when pulling lots of data off disk.
  • Eliminate Row/Index Free Space – Use the fill factor and pad index features to make sure your rows and indexes are packed tight.  This will fill more pages and reduce the number of pages accessed for queries.
  • Minimize Size of Columns Used for Indexes – Have a reference table with only 100 values?  If the PK is an INT then you’re using three times the space.  Small tweaks make a big difference when dealing with a lot of data.

Of course since he’s using a single disk that’s going to set him back some.  At the very least you should have two channels so you can setup the physical storage of the server and database in order to read and write without the bottlenecks.

UPDATE:  Dan posted a great follow-up in response to my post on his blog that goes into more detail about what he’s working with.  Tons of good advice and perspective.  After reading his post and thoughtful comments I realized that I exemplified what can go wrong when you try and oversimplify a solution to a complex problem when you don’t have all the information.  Would like to say I planned it that way to prove a point, but I’m not that smart.  

What is Data Warehouse 2.0?

March 27, 2007

Good article by Bill Inmon that discusses how data warehousing will evolve to adapt to how the business uses the data and be able to tap into the largely unstructured data that makes up most of an organization’s information.  This is what he’s terming to be DW 2.0.

If you are not sure what the difference is between structured, semi-structured and unstructured data is then you should head over to Dan Linstedt’s blog where he has a breakdown of each.  That post also goes into more of the promise of what DW 2.0 has to the business.

Mining unstructured data, finding patterns, tagging data with perspectives and drawing correlations and probabilities is pretty exciting.  For example, back when I was in the Pathology Informatics department at UPMC we collaborated with the Pittsburgh Supercomputing Institute to pick out patterns in images of tissue sample.  The goal was to have a slide captured, analyzed and labeled as to whether cancer exists in the tissue or not.

We’re getting to the point where we should begin to expect that the business intelligence we get from all our data becomes more proactive.  Even learns from us.  So that when those dashboards fire up on Monday morning our decision support systems should be more like decision suggest systems.

What’s Not a Data Warehouse

March 27, 2007

It’s amazing how many things get termed a data warehouse that’s not a data warehouse.  Claudia Imhoff has a good piece on why data integration projects are not data warehouse projects.  Nomenclature breaks down pretty quickly in DW/BI projects(or related projects that get mislabeled as such).  Most of the times I’ve heard “Oh sure, we’ve got a data warehouse” usually turns out to be nothing more than another silo.


Follow

Get every new post delivered to your Inbox.