Physically Placing the Maps in the architecture
Before we leave the maps behind, I need to live up to my promise of describing the storage characteristics of tables visited during the journey through the warehouse architecture. This must include the physical location of maps. Since believe form must follow function in a DW, let us just recall their function:
From a functional perspective, I have shown you how map tables can be used to both track and correct source system keys. Maps are not visible to the end user, but they are a necessary part of the data’s journey from the source to the final data model. Maps also provide the abstraction of, or interface to, master data sources. In the absence of those sources – the maps can even serve as a makeshift master data repository.
As we travelled along the ETL road, I also introduced you to the Entities tables (which we shall soon just call “Type1 dimensions”, because that is what they are). I deferred the exact definition of them, but we have seen how the Entities hold the master key for every entity in the warehouse, along with some book keeping columns. Entities are our next stop, but during our brief visit to them previously, we noticed that there is a foreign key between the maps and the entities, with the entity holding the primary, master key.
Because foreign keys need to look up rows on every insert and update, it makes a lot of sense to physically locate tables related through foreign keys on the same machine: co-locating frequently joined tables make joins between them much faster. Furthermore, to preserve transactional integrity, the foreign and primary key tables must live inside the same transactional container. In the case of SQL server, this means that maps and entities must live in the same database – since that is the container of transactional integrity which does not require distributed transactions.
Whenever we add a new entity key, this is driven by adding a new map entry (for a new entity to appear, it must have been observed in at least one source). Because of this, we want to make sure a backup of entities always includes maps that are consistent with that data. In SQL Server terms, on a large database, this means that maps and entities live in the same file group. In DB2 and Oracle, I believe the term is table space for the same concept (the smallest unit of consistent backup).
Because of the above we will physically locate maps and entities together like this:
In my next post in the modeling series, we will spend more time with the Entities – keepers of the master keys and type 1 history.
I will not be covering master data in further detail in this blog, see the references section below if you want to dig in.
- Category: DW and Big Data