Archive for March, 2007

Brief Bits: Linked Lists of Lists Patented

March 29, 2007

From URL:

Over at Slashdot it’s being reported that, finally, someone has patented linked lists.  My favorite comment in the wild and wooley discussion that’s going on over there is by tedgyz who states “I would show the prior art, but I can’t read the reel-to-reel tapes.

And, following that linked list theme, over at Coding Horror is a top list of top programming lists.  Maybe Jeff should patent his post before someone else beats him to it.

Google Buys Sword Swallower

March 28, 2007

Google's Sword Swallower - Hans Rosling

Over at the TED Blog there’s this post which talks about Google’s acquisition of the Trendalyzer software developed by the guy snacking on steel in the photo above.  Looks like they bought both the statistical talent and the great visualization tool.  When you get a chance, check out some of the great visual applications over at Gapminder.org

What is Data Warehouse 2.0?

March 27, 2007

Good article by Bill Inmon that discusses how data warehousing will evolve to adapt to how the business uses the data and be able to tap into the largely unstructured data that makes up most of an organization’s information.  This is what he’s terming to be DW 2.0.

If you are not sure what the difference is between structured, semi-structured and unstructured data is then you should head over to Dan Linstedt’s blog where he has a breakdown of each.  That post also goes into more of the promise of what DW 2.0 has to the business.

Mining unstructured data, finding patterns, tagging data with perspectives and drawing correlations and probabilities is pretty exciting.  For example, back when I was in the Pathology Informatics department at UPMC we collaborated with the Pittsburgh Supercomputing Institute to pick out patterns in images of tissue sample.  The goal was to have a slide captured, analyzed and labeled as to whether cancer exists in the tissue or not.

We’re getting to the point where we should begin to expect that the business intelligence we get from all our data becomes more proactive.  Even learns from us.  So that when those dashboards fire up on Monday morning our decision support systems should be more like decision suggest systems.

What’s Not a Data Warehouse

March 27, 2007

It’s amazing how many things get termed a data warehouse that’s not a data warehouse.  Claudia Imhoff has a good piece on why data integration projects are not data warehouse projects.  Nomenclature breaks down pretty quickly in DW/BI projects(or related projects that get mislabeled as such).  Most of the times I’ve heard “Oh sure, we’ve got a data warehouse” usually turns out to be nothing more than another silo.

Brief Bits: Schedule Chicken, A Cup of Joe and Predicting the Future

March 22, 2007

URLs that came across my browser today. 

If you’ve been in a number of projects you’ve probably seen the game known as Schedule Chicken. Funny but true.  Peter Clark talks about the meaningless of percent completion (“we’re 90% of the way there”) and the need to be willing to deal with project challenges transparently and head-on.  Found the link from Dare Obasanjo’s blog post on the Top Ten Signs Your Software Project is Doomed.

Next on the list – I don’t know about you, but I’m always worrying if I’m getting my daily requirement of caffeine.  Now there’s a tool that can help.  Energy Fiend has a Caffeine Database that allows you to not only see what the caffeine content is in virtually every beverage out there, but it also allows you to calculate how much you’ve had.  I’m somewhere between “OK. That’s enough” and “Ever hear of moderation?”  Link courtesy of LifeHacker.

Once you’ve got all that extra energy to spare, head over to Andrew Moore’s Statistical Data Mining Tutorials.  This is awesome material.  These are all PDF’s of slides from lectures he’s given.  Normally slides are not a great learning device without the presenter, but these stand on their own very well.  He’s working for Google now and recruiting for the new office location in Pittsburgh, PA. Tip – always try to work with people that are smarter than you.  If I were still back in the Burg I’d be begging him for a job.

What’s In Your PK?

March 21, 2007

Primary Keys: IDs versus GUIDs.  Using INTs or GUIDS as the primary key.  Jeff likes GUIDs.  Go through the comments and find Rami because he (she?) has it right – if you have any kind of size to your database you are going to take a major hit on performance just getting data on and off the disk.    While there’s just a smidge of room to waffle on this in transactional systems, the answer is a resounding NO to GUIDs as PKs for data warehouses/marts.


Follow

Get every new post delivered to your Inbox.