Rick Sherman wrote about the three common mistakes beginners make when architecting their DW/BI platform. Though he doesn’t state it explicitly, these mistakes are made by data architects that are used to designing transactional systems and not reporting systems. My experience echos his. It can take time, and lots of conversations, to help developers shift to a different paradigm.
These are the three mistakes along with my perspective on each.
1. Letting enterprise applications inspire the architecture.
The applications used for running the business are meant on capturing the data on the business. Not running reports. Most applications think of reporting on the data after the fact. The mindset is if we can get it in then we can get it out. Problem is that it’s not an effective reporting layer. Usability, efficiency and maintainability as a reporting environment suffers. Also, massive reporting can wreck havoc on an OLTP systems as it locks up records and keeps tables hostage until the query is complete.
BI vendors help foster some of this attitude with their sales force. During the sales process, one of their tactics is to take your existing data, pump it through their tool and show reports within a few days (with 24 hours being the goal). They usually leave the source schemas as is and make the necessary transformations within the BI tool instead.
It’s a powerful sales technique because the decision makers see their data in a way they never had before and it all looks soooo easy. This is done with a small subset of data in a very controlled environment. These demos do give the impression that you can use your existing data schemas as-is. However, you don’t want to turn your BI tool into your ETL tool.
2. Engaging in DW schema wars.
Those that live and breath 3NF are taught that denormalizing data is bad practice. It’s unheard of! Heck, many have at least tried to get their models into either 4NF (even 5NF?) at some point in their career. I have.
Explaining why normalizing the reporting layer is bad can be a challenge. I can’t tell you how many conversations I’ve had with data architects to get them on board with denormalizing the data for a data mart. There have been times were I had convinced someone only to have them come back later to pick up the argument.
People tend to take a stand on what they know because they don’t know what they don’t know. It boils down to using the right tool, or model, for the job. On the way to getting the data from the source systems to the BI tool, the data is going to have several landing spots along the way. Most stages will require entity-relationship modelling while the destination used for reporting will require dimensional modelling (denormalizing the data).
3. Snubbing summary tables.
Here I don’t disagree with Rick. However, I do believe it is a good idea to keep the grain of the data as low as possible without sacrificing performance and integrity. It really depends on the data, how it will be used and what the requirements are.
But there is nothing wrong with summary tables. All reporting at the BI layer is aggregating on some level. It’s good practice to make that happen in the ETL process instead of on the fly if the details aren’t needed for drill throughs. Better to have the aggregation done once instead of every time the report is run.
The best way to avoid these mistakes is to make sure you understand that there are different data models and architectures to support a fast, scalable and reliable DW/DM/BI environment than an OLTP one.