MSHGQM - Star Schemas are Stupid

Published 2024-05-18
Description:
What is Deckler's Maxim? Find out in this video!

Download MSHGQM: github.com/gdeckler/MicrosoftHatesGregsQuickMeasur…

Books!
The Definitive Guide to Power Query (M) - www.amazon.com/Definitive-Guide-Power-Query-Transf…

The Definitive Guide to Power Query (M) - www.amazon.com/Definitive-Guide-Power-Query-Transf…

Mastering Power BI 2nd Ed. - www.amazon.com/Mastering-Microsoft-Power-interacti…

Learn Power BI 2nd Ed. - www.amazon.com/Learn-Power-comprehensive-step-step…

DAX Cookbook - www.amazon.com/DAX-Cookbook-analytics-reporting-in…

Power BI Cookbook 2nd Ed. - www.amazon.com/Microsoft-Power-Cookbook-expertise-…

All Comments (10)
  • @MDevion
    Few notes: 1. A flat table is slower for example with DISTINCTCOUNT vs a SUMMARIZE with a star schema. 2. The whole reason for a star schema is to have shared dimensions between facts. If you ever have a second fact up popping up you need to redesign everything, costing more times. And thats is very likely, even in simple models. 3. Because its a bit more complex, doesnt mean its worse. Yes a 12 year old without a license can drive a car, doesnt mean its a good idea. 4. Time Intelligence is hell with a flat table. 5. You dont have to use BIDIRECTIONAL relationships. The dimensions visual can be filtered on COUNTROWS of fact table is not empty, and you will get the same result. Also, you dont always want this behaviour anyway. 6. A star schema is easily explained and ignorance should not be an excuse not to do it. 7. If you hide all non measures in a fact, you will show up as a fact, making it easier to discern dimensions from facts. That is not possible with a flat table if you still want to be able to select attributes. While Im a big fan of Keep It Simple Stupid, a star schema is a bare necessity for Power BI developers, new or experienced.
  • @vdj6298
    This is a tiny 1 thousand row dataset , not even 1 million let alone thousands. Star schema is far more performant than a single table & it is extremely easy to setup.
  • @douglascory
    It would be interesting to see one of these but comparing the Incremental Refresh Performance in the PBI Service (With vs Without)
  • Awesome points, no nonsense approach thanks Greg. Nb, dont you hate the gifs on the forum, I sure do?
  • It is heresy to refuse to drink the Star Schema Kool Aid. And yet, as Greg points out, most Power BI reporting runs will relatively small datasets, which is obviously true: Simple Pareto Analysis. In my experience, any dimensions under five million rows work very well when partially denormalized. Of course, maybe I am just a very, very good architect. And the Schema Fanatics? Often full of hot air (the optimistic substance). It is easy to forget Classic YAGNI - You Ain't Gonna Need It, at least for the first few iterations of any app or reporting requirement. And that's about as far as 99% of them go. Complexity is often High Tech Onanism - yeah, a feel good ain't I really clever - love me & my magic tricks, rather than getting to the nut of the matter. Regarding those who are concerned about scaling to "entereprise grade", certainly a concern for a handful of critical apps, note that you only need to scale UP because you are working with local rinky-dink reporting solutions rather than feeding DOWN from well-designed and professionally curated Data Marts.
  • @hamishmaxa6509
    Just a side note. You need to add 2-3 seconds per field you add to the table visual. Because, you know, the default 'helpful' behaviour of Power BI is to go ahead and aggregate columns to Count of or Sum of, because........
  • I guess my counter is that you learn a lot more thinking at an enterprise level. And a lot of business users should be trying to think at a larger scale because it's good for them :). I also find a proper semantic model easy to understand whereas a flat table is just one big mess. I would like to see more discussion on this. I also thought it was understood that vertipaq engine is so efficient even flat tables get nice performance for simple use cases. But at scale flat tables can falter. If you have evidence contradicting the Sqlbi guys I am sure they would love to see it. I don't think they would disagree about small data size.
  • Deckler's Maxim? LOL. Personally, I create a lot of semantic models in a user-led, self-service environment but I am always thinking about how my models could be deployed at Enterprise scale. I solve the slicer issue by putting a COUNTROWS DAX function in my fact table, then adding it as a filter to the slicer visual with filter value > 0. It solves the issue & I use this technique all the time. I won't speak to size & query performance issues since you are quite adept at that & I am still learning query timings. But I will say that a star schema w/ relatively uncluttered fact tables, with dimensions pushed out as 'lookup' tables, makes for a simpler data model from a design perspective and user comprehension standpoint. I definitely concede your point that Power BI newbies can get by with a single table model and, hopefully, later on migrate to more complex model structures.