Sentencias MERGE
La sentencia MERGE carga los datos en una tabla destino a partir del resultado de una consulta de origen.
Existen las siguientes variantes MERGE. Cada una de ella responde a una estrategia de carga en proyectos ETL/DWH:
| Nombre | Descripción |
|---|---|
| MERGE CLONE | Añade los registros nuevos, actualiza los existentes que hayan cambiado, y elimina los registros ausentes en el origen |
| MERGE UPSERT (o MERGE) | Añade los registros nuevos y actualiza los existentes (que hayan cambiado) |
| MERGE ALL | Añade los registros nuevos y actualiza los existentes (hayan cambiado o no) |
| MERGE UPDATE | Actualiza los existentes que hayan cambiado, sin añadir ninguno ni eliminar ninguno. Es igual que el UPDATE de Crono SQL |
| MERGE SOFT DELETE | Añade los registros nuevos, actualiza los existentes que hayan cambiado, e informa la fecha de eliminación de los registros ausentes en el origen |
| MERGE INCREMENTAL | Añade los registros nuevos, sin actualizar ni eliminar ninguno. Es igual que el INSERT de Crono SQL |
| MERGE HISTORY | Añade los registros nuevos manteniendo la historia de los cambios en los campos “fecha inicio vigencia” y “fecha fin vigencia”. Es una carga SCD2 |
Estas 7 estrategias, o un subconjunto de ellas, suelen ser suficientes para implementar cualquier proyecto ETL/DWH. Habitualmente, las estrategias MERGE CLONE y MERGE UPSERT se utilizan para cargar tablas de dimensión, y MERGE CLONE y MERGE INCREMENTAL se utilizan para cargar tablas de hechos. De todos modos, eso dependerá de las necesidades de cada tabla y de cada proyecto.
Todas las instrucciones MERGE crean la tabla de destino si es necesario, así como la clave de negocio, y los campos de auditoria.
La sintaxis es muy sencilla. Simplemente hay que indicar la consulta de origen (especificando la “clave de actualización”) y la tabla de destino. A continuación se incluyen ejemplos de uso de estas 7 estrategias.
MERGE CLONE
Sección titulada «MERGE CLONE»La instrucción MERGE CLONE realiza una actualización completa de la tabla. Es decir: Añade, elimina o actualiza todos los registros de la tabla destino para que coincidan exactamente con el contenido de la consulta de origen. Convierte literalmente la tabla destino en un clon de la consulta de origen.
Se utiliza en tablas de hechos pequeñas o en aquellas tablas de dimensión en que sabemos que no se eliminan registros, o que es aceptable que se eliminen registros obsoletos. Se trata de la estrategia más común en cualquier proyecto DWH (en muchas ocasiones MERGE CLONE es la única instrucción necesaria).
La sintaxis es idéntica a la utilizada en la sintaxis Crono SQL de UPDATE o INSERT, tal y como muestra este ejemplo.
MERGE CLONE dwh.DimProducts(ProductSid)select #ProductID, Product.Name Product, ProductCategory.name ProductCategory, ProductSubCategory.name ProductSubCategory, ProductNumber, ProductModel.name ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, WeightFROM staging.ProductLEFT JOIN staging.ProductSubCategory using Product(ProductSubcategoryID)LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)LEFT JOIN staging.ProductModel using Product(ProductModelID)Ver SQL compilado
;WITHquery AS ( SELECT ProductID, Product.Name AS Product, ProductCategory.name AS ProductCategory, ProductSubCategory.name AS ProductSubCategory, ProductNumber, ProductModel.name AS ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, Weight FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID))MERGE dwh.DimProducts AS DimProductsUSING query ON query.ProductID=DimProducts.ProductIDWHEN MATCHED AND ((DimProducts.Product<>query.Product OR (DimProducts.Product IS NULL AND query.Product IS NOT NULL) OR (DimProducts.Product IS NOT NULL AND query.Product IS NULL) OR DimProducts.ProductCategory<>query.ProductCategory OR (DimProducts.ProductCategory IS NULL AND query.ProductCategory IS NOT NULL) OR (DimProducts.ProductCategory IS NOT NULL AND query.ProductCategory IS NULL) OR DimProducts.ProductSubCategory<>query.ProductSubCategory OR (DimProducts.ProductSubCategory IS NULL AND query.ProductSubCategory IS NOT NULL) OR (DimProducts.ProductSubCategory IS NOT NULL AND query.ProductSubCategory IS NULL) OR DimProducts.ProductNumber<>query.ProductNumber OR (DimProducts.ProductNumber IS NULL AND query.ProductNumber IS NOT NULL) OR (DimProducts.ProductNumber IS NOT NULL AND query.ProductNumber IS NULL) OR DimProducts.ProductModel<>query.ProductModel OR (DimProducts.ProductModel IS NULL AND query.ProductModel IS NOT NULL) OR (DimProducts.ProductModel IS NOT NULL AND query.ProductModel IS NULL) OR DimProducts.Color<>query.Color OR (DimProducts.Color IS NULL AND query.Color IS NOT NULL) OR (DimProducts.Color IS NOT NULL AND query.Color IS NULL) OR DimProducts.StandardCost<>query.StandardCost OR (DimProducts.StandardCost IS NULL AND query.StandardCost IS NOT NULL) OR (DimProducts.StandardCost IS NOT NULL AND query.StandardCost IS NULL) OR DimProducts.ListPrice<>query.ListPrice OR (DimProducts.ListPrice IS NULL AND query.ListPrice IS NOT NULL) OR (DimProducts.ListPrice IS NOT NULL AND query.ListPrice IS NULL) OR DimProducts.Size<>query.Size OR (DimProducts.Size IS NULL AND query.Size IS NOT NULL) OR (DimProducts.Size IS NOT NULL AND query.Size IS NULL) OR DimProducts.SizeUnitMeasureCode<>query.SizeUnitMeasureCode OR (DimProducts.SizeUnitMeasureCode IS NULL AND query.SizeUnitMeasureCode IS NOT NULL) OR (DimProducts.SizeUnitMeasureCode IS NOT NULL AND query.SizeUnitMeasureCode IS NULL) OR DimProducts.Weight<>query.Weight OR (DimProducts.Weight IS NULL AND query.Weight IS NOT NULL) OR (DimProducts.Weight IS NOT NULL AND query.Weight IS NULL))) THEN UPDATE SET Product=query.Product, ProductCategory=query.ProductCategory, ProductSubCategory=query.ProductSubCategory, ProductNumber=query.ProductNumber, ProductModel=query.ProductModel, Color=query.Color, StandardCost=query.StandardCost, ListPrice=query.ListPrice, Size=query.Size, SizeUnitMeasureCode=query.SizeUnitMeasureCode, Weight=query.WeightWHEN NOT MATCHED THEN INSERT (ProductID,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,StandardCost,ListPrice,Size,SizeUnitMeasureCode,Weight) VALUES ( query.ProductID, query.Product, query.ProductCategory, query.ProductSubCategory, query.ProductNumber, query.ProductModel, query.Color, query.StandardCost, query.ListPrice, query.Size, query.SizeUnitMeasureCode, query.Weight)WHEN NOT MATCHED BY SOURCE THEN DELETE;Es importante observar que el campo ProductId está precedido por el carácter numeral #. Esta marca sirve para identificar la clave de actualización. Habitualmente coincide con la clave de negocio (código único que identifica a cada registro).
El código generado realiza un MERGE e informa convenientemente los campos FechaAlta, FechaBaja y FechaModificacion.
Si la tabla no existe en el momento de ejecución, la creará añadiéndole el campo ProductSid como una clave subrogada (un entero autonumérico INDENTITY). También creará un UNIQUE CONSTRAINT para el campo ProductId (que debe ser único).
Si en el futuro se modifica la consulta, añadiendo más campos a la dimensión, Crono SQL generará también el código necesario para crear esos nuevos campos en la tabla.
En este ejemplo, después de los JOIN, se ha incluido la clausula CHECK SNOWFLAKE. Por lo tanto, antes de realizar la carga, Crono SQL verificará que la consulta de origen no pierda ni duplique registros de staging.Products.
El código generado es óptimo y presenta un gran rendimiento, por lo que esta estrategia es adecuada también en tablas de hechos no excesivamente grandes:
MERGE CLONE dwh.FactSalesOrderDetails(SalesOrderDetailSid)SELECT SalesOrderDetail.SalesOrderDetailID #SalesOrderDetailID, DimProducts.ProductSid ProductSid NONUNIQUE REFERENCES dwh.DimProducts, FactSalesOrderHeader.SalesOrderSid SalesOrderSid NONUNIQUE REFERENCES dwh.FactSalesOrderHeader, SalesOrderDetail.CarrierTrackingNumber, SalesOrderDetail.OrderQty, SalesOrderDetail.UnitPrice, SalesOrderDetail.UnitPriceDiscount, SalesOrderDetail.LineTotal, SpecialOffer.Description SpecialOffer, SpecialOffer.Type SpecialOfferType, SpecialOffer.Category SpecialOfferCategory,from @@erp.SalesOrderDetail SalesOrderDetailinner join @@erp.SalesOrderHeader using (SalesOrderID)inner join @@erp.SpecialOffer using (SpecialOfferID)inner join @@erp.Product using (ProductID)inner join dwh.DimProducts using Product(ProductID)inner join dwh.FactSalesOrderHeader using SalesOrderHeader(SalesOrderID)check snowflakeVer SQL compilado
IF EXISTS ( SELECT count(*) FROM @@erp.SalesOrderDetail LEFT JOIN @@erp.SalesOrderHeader ON (SalesOrderDetail.SalesOrderID=SalesOrderHeader.SalesOrderID) LEFT JOIN @@erp.SpecialOffer ON (SalesOrderDetail.SpecialOfferID=SpecialOffer.SpecialOfferID) LEFT JOIN @@erp.Product ON (SalesOrderDetail.ProductID=Product.ProductID) LEFT JOIN dwh.DimProducts ON (Product.ProductID=DimProducts.ProductID) LEFT JOIN dwh.FactSalesOrderHeader ON (SalesOrderHeader.SalesOrderID=FactSalesOrderHeader.SalesOrderID) HAVING count(CASE WHEN SalesOrderHeader.SalesOrderID IS NOT NULL AND SpecialOffer.SpecialOfferID IS NOT NULL AND Product.ProductID IS NOT NULL AND DimProducts.ProductID IS NOT NULL AND FactSalesOrderHeader.SalesOrderID IS NOT NULL THEN 1 END) <> (SELECT count(*) FROM @@erp.SalesOrderDetail)) THROW 50001,'Las relaciones de esta consulta pierden o duplican registros de SalesOrderDetail.',1
;WITHquery AS ( SELECT SalesOrderDetail.SalesOrderDetailID AS SalesOrderDetailID, DimProducts.ProductSid AS ProductSid, FactSalesOrderHeader.SalesOrderSid AS SalesOrderSid, SalesOrderDetail.CarrierTrackingNumber AS CarrierTrackingNumber, SalesOrderDetail.OrderQty AS OrderQty, SalesOrderDetail.UnitPrice AS UnitPrice, SalesOrderDetail.UnitPriceDiscount AS UnitPriceDiscount, SalesOrderDetail.LineTotal AS LineTotal, SpecialOffer.Description AS SpecialOffer, SpecialOffer.Type AS SpecialOfferType, SpecialOffer.Category AS SpecialOfferCategory FROM @@erp.SalesOrderDetail INNER JOIN @@erp.SalesOrderHeader ON (SalesOrderDetail.SalesOrderID=SalesOrderHeader.SalesOrderID) INNER JOIN @@erp.SpecialOffer ON (SalesOrderDetail.SpecialOfferID=SpecialOffer.SpecialOfferID) INNER JOIN @@erp.Product ON (SalesOrderDetail.ProductID=Product.ProductID) INNER JOIN dwh.DimProducts ON (Product.ProductID=DimProducts.ProductID) INNER JOIN dwh.FactSalesOrderHeader ON (SalesOrderHeader.SalesOrderID=FactSalesOrderHeader.SalesOrderID))MERGE dwh.FactSalesOrderDetails AS FactSalesOrderDetailsUSING query ON query.SalesOrderDetailID=FactSalesOrderDetails.SalesOrderDetailIDWHEN MATCHED AND ((FactSalesOrderDetails.ProductSid<>query.ProductSid OR (FactSalesOrderDetails.ProductSid IS NULL AND query.ProductSid IS NOT NULL) OR (FactSalesOrderDetails.ProductSid IS NOT NULL AND query.ProductSid IS NULL) OR FactSalesOrderDetails.SalesOrderSid<>query.SalesOrderSid OR (FactSalesOrderDetails.SalesOrderSid IS NULL AND query.SalesOrderSid IS NOT NULL) OR (FactSalesOrderDetails.SalesOrderSid IS NOT NULL AND query.SalesOrderSid IS NULL) OR FactSalesOrderDetails.CarrierTrackingNumber<>query.CarrierTrackingNumber OR (FactSalesOrderDetails.CarrierTrackingNumber IS NULL AND query.CarrierTrackingNumber IS NOT NULL) OR (FactSalesOrderDetails.CarrierTrackingNumber IS NOT NULL AND query.CarrierTrackingNumber IS NULL) OR FactSalesOrderDetails.OrderQty<>query.OrderQty OR (FactSalesOrderDetails.OrderQty IS NULL AND query.OrderQty IS NOT NULL) OR (FactSalesOrderDetails.OrderQty IS NOT NULL AND query.OrderQty IS NULL) OR FactSalesOrderDetails.UnitPrice<>query.UnitPrice OR (FactSalesOrderDetails.UnitPrice IS NULL AND query.UnitPrice IS NOT NULL) OR (FactSalesOrderDetails.UnitPrice IS NOT NULL AND query.UnitPrice IS NULL) OR FactSalesOrderDetails.UnitPriceDiscount<>query.UnitPriceDiscount OR (FactSalesOrderDetails.UnitPriceDiscount IS NULL AND query.UnitPriceDiscount IS NOT NULL) OR (FactSalesOrderDetails.UnitPriceDiscount IS NOT NULL AND query.UnitPriceDiscount IS NULL) OR FactSalesOrderDetails.LineTotal<>query.LineTotal OR (FactSalesOrderDetails.LineTotal IS NULL AND query.LineTotal IS NOT NULL) OR (FactSalesOrderDetails.LineTotal IS NOT NULL AND query.LineTotal IS NULL) OR FactSalesOrderDetails.SpecialOffer<>query.SpecialOffer OR (FactSalesOrderDetails.SpecialOffer IS NULL AND query.SpecialOffer IS NOT NULL) OR (FactSalesOrderDetails.SpecialOffer IS NOT NULL AND query.SpecialOffer IS NULL) OR FactSalesOrderDetails.SpecialOfferType<>query.SpecialOfferType OR (FactSalesOrderDetails.SpecialOfferType IS NULL AND query.SpecialOfferType IS NOT NULL) OR (FactSalesOrderDetails.SpecialOfferType IS NOT NULL AND query.SpecialOfferType IS NULL) OR FactSalesOrderDetails.SpecialOfferCategory<>query.SpecialOfferCategory OR (FactSalesOrderDetails.SpecialOfferCategory IS NULL AND query.SpecialOfferCategory IS NOT NULL) OR (FactSalesOrderDetails.SpecialOfferCategory IS NOT NULL AND query.SpecialOfferCategory IS NULL))) THEN UPDATE SET ProductSid=query.ProductSid, SalesOrderSid=query.SalesOrderSid, CarrierTrackingNumber=query.CarrierTrackingNumber, OrderQty=query.OrderQty, UnitPrice=query.UnitPrice, UnitPriceDiscount=query.UnitPriceDiscount, LineTotal=query.LineTotal, SpecialOffer=query.SpecialOffer, SpecialOfferType=query.SpecialOfferType, SpecialOfferCategory=query.SpecialOfferCategoryWHEN NOT MATCHED THEN INSERT (SalesOrderDetailID,ProductSid,SalesOrderSid,CarrierTrackingNumber,OrderQty,UnitPrice,UnitPriceDiscount,LineTotal,SpecialOffer,SpecialOfferType,SpecialOfferCategory) VALUES ( query.SalesOrderDetailID, query.ProductSid, query.SalesOrderSid, query.CarrierTrackingNumber, query.OrderQty, query.UnitPrice, query.UnitPriceDiscount, query.LineTotal, query.SpecialOffer, query.SpecialOfferType, query.SpecialOfferCategory)WHEN NOT MATCHED BY SOURCE THEN DELETE;Incluso si la tabla de origen tuviese unos pocos millones de registros, la carga anterior se ejecutaría rápidamente (con el Hardware adecuado).
En cambio, si la tabla de origen tuviese varias decenas de millones de registros (o más) recomendaríamos utilizar alguna estrategia de carga incremental.
MERGE UPSERT
Sección titulada «MERGE UPSERT»La instrucción MERGE UPSERT es una carga de dimensión lentamente cambiante tipo 1. Esta estrategia de carga actualiza los registros que se han modificado desde la última carga y añade los nuevos, pero nunca elimina los registros existentes.
Se utiliza habitualmente para cargar tablas de dimensión, ya que se requiere mantener los registros antiguos para respetar la integridad (no queremos borrar un producto o un cliente que tal vez tiene ventas u otras transacciones asociadas).
MERGE UPSERT dwh.DimProducts(ProductSid)SELECT #ProductID, Product.Name Product, ProductCategory.name ProductCategory, ProductSubCategory.name ProductSubCategory, ProductNumber, ProductModel.name ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, WeightFROM staging.ProductLEFT JOIN staging.ProductSubCategory using Product(ProductSubcategoryID)LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)LEFT JOIN staging.ProductModel using Product(ProductModelID)CHECK SNOWFLAKEVer SQL compilado
IF EXISTS ( SELECT count(*) FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID) HAVING count(*) <> (SELECT count(*) FROM staging.Product)) THROW 50001,'Las relaciones de esta consulta pierden o duplican registros de Product.',1
;WITHquery AS ( SELECT ProductID, Product.Name AS Product, ProductCategory.name AS ProductCategory, ProductSubCategory.name AS ProductSubCategory, ProductNumber, ProductModel.name AS ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, Weight FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID))MERGE dwh.DimProducts AS DimProductsUSING query ON query.ProductID=DimProducts.ProductIDWHEN MATCHED AND ((DimProducts.Product<>query.Product OR (DimProducts.Product IS NULL AND query.Product IS NOT NULL) OR (DimProducts.Product IS NOT NULL AND query.Product IS NULL) OR DimProducts.ProductCategory<>query.ProductCategory OR (DimProducts.ProductCategory IS NULL AND query.ProductCategory IS NOT NULL) OR (DimProducts.ProductCategory IS NOT NULL AND query.ProductCategory IS NULL) OR DimProducts.ProductSubCategory<>query.ProductSubCategory OR (DimProducts.ProductSubCategory IS NULL AND query.ProductSubCategory IS NOT NULL) OR (DimProducts.ProductSubCategory IS NOT NULL AND query.ProductSubCategory IS NULL) OR DimProducts.ProductNumber<>query.ProductNumber OR (DimProducts.ProductNumber IS NULL AND query.ProductNumber IS NOT NULL) OR (DimProducts.ProductNumber IS NOT NULL AND query.ProductNumber IS NULL) OR DimProducts.ProductModel<>query.ProductModel OR (DimProducts.ProductModel IS NULL AND query.ProductModel IS NOT NULL) OR (DimProducts.ProductModel IS NOT NULL AND query.ProductModel IS NULL) OR DimProducts.Color<>query.Color OR (DimProducts.Color IS NULL AND query.Color IS NOT NULL) OR (DimProducts.Color IS NOT NULL AND query.Color IS NULL) OR DimProducts.StandardCost<>query.StandardCost OR (DimProducts.StandardCost IS NULL AND query.StandardCost IS NOT NULL) OR (DimProducts.StandardCost IS NOT NULL AND query.StandardCost IS NULL) OR DimProducts.ListPrice<>query.ListPrice OR (DimProducts.ListPrice IS NULL AND query.ListPrice IS NOT NULL) OR (DimProducts.ListPrice IS NOT NULL AND query.ListPrice IS NULL) OR DimProducts.Size<>query.Size OR (DimProducts.Size IS NULL AND query.Size IS NOT NULL) OR (DimProducts.Size IS NOT NULL AND query.Size IS NULL) OR DimProducts.SizeUnitMeasureCode<>query.SizeUnitMeasureCode OR (DimProducts.SizeUnitMeasureCode IS NULL AND query.SizeUnitMeasureCode IS NOT NULL) OR (DimProducts.SizeUnitMeasureCode IS NOT NULL AND query.SizeUnitMeasureCode IS NULL) OR DimProducts.Weight<>query.Weight OR (DimProducts.Weight IS NULL AND query.Weight IS NOT NULL) OR (DimProducts.Weight IS NOT NULL AND query.Weight IS NULL))) THEN UPDATE SET Product=query.Product, ProductCategory=query.ProductCategory, ProductSubCategory=query.ProductSubCategory, ProductNumber=query.ProductNumber, ProductModel=query.ProductModel, Color=query.Color, StandardCost=query.StandardCost, ListPrice=query.ListPrice, Size=query.Size, SizeUnitMeasureCode=query.SizeUnitMeasureCode, Weight=query.WeightWHEN NOT MATCHED THEN INSERT (ProductID,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,StandardCost,ListPrice,Size,SizeUnitMeasureCode,Weight) VALUES ( query.ProductID, query.Product, query.ProductCategory, query.ProductSubCategory, query.ProductNumber, query.ProductModel, query.Color, query.StandardCost, query.ListPrice, query.Size, query.SizeUnitMeasureCode, query.Weight);Es importante observar que el campo ProductId está precedido por el carácter numeral #. Esta marca es importante ya que sirve para identificar la clave de actualización. Habitualmente coincide con la clave de negocio (código único que identifica a cada registro)
El código generado realiza un MERGE para actualizar o insertar cada registro. También informa convenientemente los campos FechaAlta, FechaBaja y FechaModificacion.
Al igual que en todas las instrucciones MERGE de Crono SQL, se creará la tabla si no existe. También se añadirán los campos, índices o restricciones que falten.
Las estrategía UPSERT es la predeterminada del MERGE. Por este motivo, el siguiente ejemplo es exactamente equivalente a ejecutar MERGE UPSERT.
MERGE dwh.DimProducts(ProductSid)select #ProductID, Product.Name Product, ProductCategory.name ProductCategory, ProductSubCategory.name ProductSubCategory, ProductNumber, ProductModel.name ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, WeightFROM staging.ProductLEFT JOIN staging.ProductSubCategory using Product(ProductSubcategoryID)LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)LEFT JOIN staging.ProductModel using Product(ProductModelID)CHECK SNOWFLAKEVer SQL compilado
IF EXISTS ( SELECT count(*) FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID) HAVING count(*) <> (SELECT count(*) FROM staging.Product)) THROW 50001,'Las relaciones de esta consulta pierden o duplican registros de Product.',1
;WITHquery AS ( SELECT ProductID, Product.Name AS Product, ProductCategory.name AS ProductCategory, ProductSubCategory.name AS ProductSubCategory, ProductNumber, ProductModel.name AS ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, Weight FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID))MERGE dwh.DimProducts AS DimProductsUSING query ON query.ProductID=DimProducts.ProductIDWHEN MATCHED AND ((DimProducts.Product<>query.Product OR (DimProducts.Product IS NULL AND query.Product IS NOT NULL) OR (DimProducts.Product IS NOT NULL AND query.Product IS NULL) OR DimProducts.ProductCategory<>query.ProductCategory OR (DimProducts.ProductCategory IS NULL AND query.ProductCategory IS NOT NULL) OR (DimProducts.ProductCategory IS NOT NULL AND query.ProductCategory IS NULL) OR DimProducts.ProductSubCategory<>query.ProductSubCategory OR (DimProducts.ProductSubCategory IS NULL AND query.ProductSubCategory IS NOT NULL) OR (DimProducts.ProductSubCategory IS NOT NULL AND query.ProductSubCategory IS NULL) OR DimProducts.ProductNumber<>query.ProductNumber OR (DimProducts.ProductNumber IS NULL AND query.ProductNumber IS NOT NULL) OR (DimProducts.ProductNumber IS NOT NULL AND query.ProductNumber IS NULL) OR DimProducts.ProductModel<>query.ProductModel OR (DimProducts.ProductModel IS NULL AND query.ProductModel IS NOT NULL) OR (DimProducts.ProductModel IS NOT NULL AND query.ProductModel IS NULL) OR DimProducts.Color<>query.Color OR (DimProducts.Color IS NULL AND query.Color IS NOT NULL) OR (DimProducts.Color IS NOT NULL AND query.Color IS NULL) OR DimProducts.StandardCost<>query.StandardCost OR (DimProducts.StandardCost IS NULL AND query.StandardCost IS NOT NULL) OR (DimProducts.StandardCost IS NOT NULL AND query.StandardCost IS NULL) OR DimProducts.ListPrice<>query.ListPrice OR (DimProducts.ListPrice IS NULL AND query.ListPrice IS NOT NULL) OR (DimProducts.ListPrice IS NOT NULL AND query.ListPrice IS NULL) OR DimProducts.Size<>query.Size OR (DimProducts.Size IS NULL AND query.Size IS NOT NULL) OR (DimProducts.Size IS NOT NULL AND query.Size IS NULL) OR DimProducts.SizeUnitMeasureCode<>query.SizeUnitMeasureCode OR (DimProducts.SizeUnitMeasureCode IS NULL AND query.SizeUnitMeasureCode IS NOT NULL) OR (DimProducts.SizeUnitMeasureCode IS NOT NULL AND query.SizeUnitMeasureCode IS NULL) OR DimProducts.Weight<>query.Weight OR (DimProducts.Weight IS NULL AND query.Weight IS NOT NULL) OR (DimProducts.Weight IS NOT NULL AND query.Weight IS NULL))) THEN UPDATE SET Product=query.Product, ProductCategory=query.ProductCategory, ProductSubCategory=query.ProductSubCategory, ProductNumber=query.ProductNumber, ProductModel=query.ProductModel, Color=query.Color, StandardCost=query.StandardCost, ListPrice=query.ListPrice, Size=query.Size, SizeUnitMeasureCode=query.SizeUnitMeasureCode, Weight=query.WeightWHEN NOT MATCHED THEN INSERT (ProductID,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,StandardCost,ListPrice,Size,SizeUnitMeasureCode,Weight) VALUES ( query.ProductID, query.Product, query.ProductCategory, query.ProductSubCategory, query.ProductNumber, query.ProductModel, query.Color, query.StandardCost, query.ListPrice, query.Size, query.SizeUnitMeasureCode, query.Weight);MERGE ALL
Sección titulada «MERGE ALL»La instrucción MERGE ALL se comparta igual que MERGE UPSERT. La única diferencia es que MERGE ALL actualiza todos los registros (hayan cambiado o no). Esta diferencia raramente aportará ningún beneficio (¡Al contrario… penalizará el rendimiento por actualizar registros que no lo necesitan!), por lo que se prefiere usar MERGE UPSERT.
MERGE ALL dwh.DimProducts(ProductSid)SELECT #ProductID, Product.Name Product, ProductCategory.name ProductCategory, ProductSubCategory.name ProductSubCategory, ProductNumber, ProductModel.name ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, WeightFROM staging.ProductLEFT JOIN staging.ProductSubCategory using Product(ProductSubcategoryID)LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)LEFT JOIN staging.ProductModel using Product(ProductModelID)CHECK SNOWFLAKEVer SQL compilado
IF EXISTS ( SELECT count(*) FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID) HAVING count(*) <> (SELECT count(*) FROM staging.Product)) THROW 50001,'Las relaciones de esta consulta pierden o duplican registros de Product.',1
;WITHquery AS ( SELECT ProductID, Product.Name AS Product, ProductCategory.name AS ProductCategory, ProductSubCategory.name AS ProductSubCategory, ProductNumber, ProductModel.name AS ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, Weight FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID))MERGE dwh.DimProducts AS DimProductsUSING query ON query.ProductID=DimProducts.ProductIDWHEN MATCHED THEN UPDATE SET Product=query.Product, ProductCategory=query.ProductCategory, ProductSubCategory=query.ProductSubCategory, ProductNumber=query.ProductNumber, ProductModel=query.ProductModel, Color=query.Color, StandardCost=query.StandardCost, ListPrice=query.ListPrice, Size=query.Size, SizeUnitMeasureCode=query.SizeUnitMeasureCode, Weight=query.WeightWHEN NOT MATCHED THEN INSERT (ProductID,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,StandardCost,ListPrice,Size,SizeUnitMeasureCode,Weight) VALUES ( query.ProductID, query.Product, query.ProductCategory, query.ProductSubCategory, query.ProductNumber, query.ProductModel, query.Color, query.StandardCost, query.ListPrice, query.Size, query.SizeUnitMeasureCode, query.Weight);MERGE UPDATE
Sección titulada «MERGE UPDATE»La instrucción MERGE UPDATE actualiza los registros que aparecen en la consulta de origen y ya existen en la tabla de destino. No elimina ni inserta ningún registro.
MERGE UPDATE dwh.DimProducts(ProductSid)SELECT #ProductID, Product.Name Product, ProductCategory.name ProductCategory, ProductSubCategory.name ProductSubCategory, ProductNumber, ProductModel.name ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, WeightFROM staging.ProductLEFT JOIN staging.ProductSubCategory using Product(ProductSubcategoryID)LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)LEFT JOIN staging.ProductModel using Product(ProductModelID)CHECK SNOWFLAKEVer SQL compilado
IF EXISTS ( SELECT count(*) FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID) HAVING count(*) <> (SELECT count(*) FROM staging.Product)) THROW 50001,'Las relaciones de esta consulta pierden o duplican registros de Product.',1
;WITHquery AS ( SELECT ProductID, Product.Name AS Product, ProductCategory.name AS ProductCategory, ProductSubCategory.name AS ProductSubCategory, ProductNumber, ProductModel.name AS ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, Weight FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID))MERGE dwh.DimProducts AS DimProductsUSING query ON query.ProductID=DimProducts.ProductIDWHEN MATCHED AND ((DimProducts.Product<>query.Product OR (DimProducts.Product IS NULL AND query.Product IS NOT NULL) OR (DimProducts.Product IS NOT NULL AND query.Product IS NULL) OR DimProducts.ProductCategory<>query.ProductCategory OR (DimProducts.ProductCategory IS NULL AND query.ProductCategory IS NOT NULL) OR (DimProducts.ProductCategory IS NOT NULL AND query.ProductCategory IS NULL) OR DimProducts.ProductSubCategory<>query.ProductSubCategory OR (DimProducts.ProductSubCategory IS NULL AND query.ProductSubCategory IS NOT NULL) OR (DimProducts.ProductSubCategory IS NOT NULL AND query.ProductSubCategory IS NULL) OR DimProducts.ProductNumber<>query.ProductNumber OR (DimProducts.ProductNumber IS NULL AND query.ProductNumber IS NOT NULL) OR (DimProducts.ProductNumber IS NOT NULL AND query.ProductNumber IS NULL) OR DimProducts.ProductModel<>query.ProductModel OR (DimProducts.ProductModel IS NULL AND query.ProductModel IS NOT NULL) OR (DimProducts.ProductModel IS NOT NULL AND query.ProductModel IS NULL) OR DimProducts.Color<>query.Color OR (DimProducts.Color IS NULL AND query.Color IS NOT NULL) OR (DimProducts.Color IS NOT NULL AND query.Color IS NULL) OR DimProducts.StandardCost<>query.StandardCost OR (DimProducts.StandardCost IS NULL AND query.StandardCost IS NOT NULL) OR (DimProducts.StandardCost IS NOT NULL AND query.StandardCost IS NULL) OR DimProducts.ListPrice<>query.ListPrice OR (DimProducts.ListPrice IS NULL AND query.ListPrice IS NOT NULL) OR (DimProducts.ListPrice IS NOT NULL AND query.ListPrice IS NULL) OR DimProducts.Size<>query.Size OR (DimProducts.Size IS NULL AND query.Size IS NOT NULL) OR (DimProducts.Size IS NOT NULL AND query.Size IS NULL) OR DimProducts.SizeUnitMeasureCode<>query.SizeUnitMeasureCode OR (DimProducts.SizeUnitMeasureCode IS NULL AND query.SizeUnitMeasureCode IS NOT NULL) OR (DimProducts.SizeUnitMeasureCode IS NOT NULL AND query.SizeUnitMeasureCode IS NULL) OR DimProducts.Weight<>query.Weight OR (DimProducts.Weight IS NULL AND query.Weight IS NOT NULL) OR (DimProducts.Weight IS NOT NULL AND query.Weight IS NULL))) THEN UPDATE SET Product=query.Product, ProductCategory=query.ProductCategory, ProductSubCategory=query.ProductSubCategory, ProductNumber=query.ProductNumber, ProductModel=query.ProductModel, Color=query.Color, StandardCost=query.StandardCost, ListPrice=query.ListPrice, Size=query.Size, SizeUnitMeasureCode=query.SizeUnitMeasureCode, Weight=query.Weight;La instrucción MERGE UPDATE es equivalente a la instrucción UPDATE de Crono SQL
MERGE SOFT DELETE
Sección titulada «MERGE SOFT DELETE»La instrucción MERGE SOFT DELETE actualiza los registros que se han modificado desde la última carga, añade los nuevos y establece la fecha de eliminación de los registros eliminados en el origen. Es decir, en lugar de eliminar físicamente los registros, hace un borrado lógico.
Se utiliza habitualmente para cargar tablas de dimensión, ya que se requiere mantener los registros antiguos para respetar la integridad (no queremos borrar un producto o un cliente que tal vez tiene ventas u otras transacciones asociadas).
MERGE SOFT DELETE dwh.DimProducts(ProductSid)select #ProductID, Product.Name Product, ProductCategory.name ProductCategory, ProductSubCategory.name ProductSubCategory, ProductNumber, ProductModel.name ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, WeightFROM staging.ProductLEFT JOIN staging.ProductSubCategory using Product(ProductSubcategoryID)LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)LEFT JOIN staging.ProductModel using Product(ProductModelID)Ver SQL compilado
;WITHquery AS ( SELECT ProductID, Product.Name AS Product, ProductCategory.name AS ProductCategory, ProductSubCategory.name AS ProductSubCategory, ProductNumber, ProductModel.name AS ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, Weight FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID))MERGE dwh.DimProducts AS DimProductsUSING query ON query.ProductID=DimProducts.ProductIDWHEN MATCHED AND ((DimProducts.Product<>query.Product OR (DimProducts.Product IS NULL AND query.Product IS NOT NULL) OR (DimProducts.Product IS NOT NULL AND query.Product IS NULL) OR DimProducts.ProductCategory<>query.ProductCategory OR (DimProducts.ProductCategory IS NULL AND query.ProductCategory IS NOT NULL) OR (DimProducts.ProductCategory IS NOT NULL AND query.ProductCategory IS NULL) OR DimProducts.ProductSubCategory<>query.ProductSubCategory OR (DimProducts.ProductSubCategory IS NULL AND query.ProductSubCategory IS NOT NULL) OR (DimProducts.ProductSubCategory IS NOT NULL AND query.ProductSubCategory IS NULL) OR DimProducts.ProductNumber<>query.ProductNumber OR (DimProducts.ProductNumber IS NULL AND query.ProductNumber IS NOT NULL) OR (DimProducts.ProductNumber IS NOT NULL AND query.ProductNumber IS NULL) OR DimProducts.ProductModel<>query.ProductModel OR (DimProducts.ProductModel IS NULL AND query.ProductModel IS NOT NULL) OR (DimProducts.ProductModel IS NOT NULL AND query.ProductModel IS NULL) OR DimProducts.Color<>query.Color OR (DimProducts.Color IS NULL AND query.Color IS NOT NULL) OR (DimProducts.Color IS NOT NULL AND query.Color IS NULL) OR DimProducts.StandardCost<>query.StandardCost OR (DimProducts.StandardCost IS NULL AND query.StandardCost IS NOT NULL) OR (DimProducts.StandardCost IS NOT NULL AND query.StandardCost IS NULL) OR DimProducts.ListPrice<>query.ListPrice OR (DimProducts.ListPrice IS NULL AND query.ListPrice IS NOT NULL) OR (DimProducts.ListPrice IS NOT NULL AND query.ListPrice IS NULL) OR DimProducts.Size<>query.Size OR (DimProducts.Size IS NULL AND query.Size IS NOT NULL) OR (DimProducts.Size IS NOT NULL AND query.Size IS NULL) OR DimProducts.SizeUnitMeasureCode<>query.SizeUnitMeasureCode OR (DimProducts.SizeUnitMeasureCode IS NULL AND query.SizeUnitMeasureCode IS NOT NULL) OR (DimProducts.SizeUnitMeasureCode IS NOT NULL AND query.SizeUnitMeasureCode IS NULL) OR DimProducts.Weight<>query.Weight OR (DimProducts.Weight IS NULL AND query.Weight IS NOT NULL) OR (DimProducts.Weight IS NOT NULL AND query.Weight IS NULL))) THEN UPDATE SET Product=query.Product, ProductCategory=query.ProductCategory, ProductSubCategory=query.ProductSubCategory, ProductNumber=query.ProductNumber, ProductModel=query.ProductModel, Color=query.Color, StandardCost=query.StandardCost, ListPrice=query.ListPrice, Size=query.Size, SizeUnitMeasureCode=query.SizeUnitMeasureCode, Weight=query.WeightWHEN NOT MATCHED THEN INSERT (ProductID,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,StandardCost,ListPrice,Size,SizeUnitMeasureCode,Weight) VALUES ( query.ProductID, query.Product, query.ProductCategory, query.ProductSubCategory, query.ProductNumber, query.ProductModel, query.Color, query.StandardCost, query.ListPrice, query.Size, query.SizeUnitMeasureCode, query.Weight);MERGE INCREMENTAL
Sección titulada «MERGE INCREMENTAL»La instrucción MERGE INCREMENTAL realiza una carga incremental de la tabla.
Las cargas incrementales son adecuadas para tablas de hechos con muchos millones de registros.
Frecuentemente las cargas incrementales son problemáticas:
- La mayor dificultad de las cargas incrementales es determinar los registros que se deben añadir. Es necesario identificar en origen los registros nuevos desde la última carga (mediante un timestamp, habitualmente).
- Otro riesgo de las cargas incrementales es que modificaciones extraordinarias en el origen pueden no reflejarse en en DWH (no siempre los responsables del ERP puede asegurar que ningún proceso o incidencia pueda modificar algún día registros antiguos).
Por estos motivos, si el tiempo de ejecución es aceptable, es preferible utilizar MERGE CLONE siempre que sea posible, aunque el tiempo de ejecución sea algo mayor.
MERGE INCREMENTAL dwh.FactSalesOrderHeader(SalesOrderSid)SELECT SalesOrderHeader.SalesOrderId, Customer.CustomerId, cast(SalesOrderHeader.OrderDate as date) OrderDate, SalesOrderHeader.SalesOrderNumber, cast(SalesOrderHeader.DueDate as date) DueDate, cast(SalesOrderHeader.ShipDate as date) ShipDate, SalesOrderHeader.OnlineOrderFlag, SalesOrderHeader.PurchaseOrderNumber, SalesOrderHeader.AccountNumber, SalesOrderHeader.Freight, SalesOrderHeader.CreditCardApprovalCode, SalesOrderHeader.SubTotal Amount, SalesOrderHeader.TaxAmtFROM staging.SalesOrderHeaderINNER JOIN staging.customer using SalesOrderHeader(customerId)where cast(OrderDate as date)=today()Ver SQL compilado
;WITHquery AS ( SELECT SalesOrderHeader.SalesOrderId AS SalesOrderId, Customer.CustomerId AS CustomerId, CAST(SalesOrderHeader.OrderDate AS date) AS OrderDate, SalesOrderHeader.SalesOrderNumber AS SalesOrderNumber, CAST(SalesOrderHeader.DueDate AS date) AS DueDate, CAST(SalesOrderHeader.ShipDate AS date) AS ShipDate, SalesOrderHeader.OnlineOrderFlag AS OnlineOrderFlag, SalesOrderHeader.PurchaseOrderNumber AS PurchaseOrderNumber, SalesOrderHeader.AccountNumber AS AccountNumber, SalesOrderHeader.Freight AS Freight, SalesOrderHeader.CreditCardApprovalCode AS CreditCardApprovalCode, SalesOrderHeader.SubTotal AS Amount, SalesOrderHeader.TaxAmt AS TaxAmt FROM staging.SalesOrderHeader INNER JOIN staging.customer ON (SalesOrderHeader.customerId=customer.customerId) WHERE CAST(CAST(SalesOrderHeader.OrderDate AS date) AS date)=cast(getdate() as date))INSERT dwh.FactSalesOrderHeader(SalesOrderId,CustomerId,OrderDate,SalesOrderNumber,DueDate,ShipDate,OnlineOrderFlag,PurchaseOrderNumber,AccountNumber,Freight,CreditCardApprovalCode,Amount,TaxAmt)SELECT query.SalesOrderId, query.CustomerId, query.OrderDate, query.SalesOrderNumber, query.DueDate, query.ShipDate, query.OnlineOrderFlag, query.PurchaseOrderNumber, query.AccountNumber, query.Freight, query.CreditCardApprovalCode, query.Amount, query.TaxAmtFROM query;La anterior sentencia cargará las ventas del día anterior (y solo las del día anterior). Evidentemente, esta estrategia es débil y propensa a errores. Dejará de cargar registros si algún día no se ejecuta la carga, o podría duplicarlos si se ejecutase dos veces un mismo día. También fallaría si algún proceso del ERP se retrasase varios días en insertar los datos de alguna tienda…
La siguiente estrategia resuelve parcialmente el problema:
MERGE INCREMENTAL dwh.FactSalesOrderHeader(SalesOrderSid)SELECT SalesOrderHeader.SalesOrderId #SalesOrderId, Customer.CustomerId, cast(SalesOrderHeader.OrderDate as date) OrderDate, SalesOrderHeader.SalesOrderNumber, cast(SalesOrderHeader.DueDate as date) DueDate, cast(SalesOrderHeader.ShipDate as date) ShipDate, SalesOrderHeader.OnlineOrderFlag, SalesOrderHeader.PurchaseOrderNumber, SalesOrderHeader.AccountNumber, SalesOrderHeader.Freight, SalesOrderHeader.CreditCardApprovalCode, SalesOrderHeader.SubTotal Amount, SalesOrderHeader.TaxAmtFROM staging.SalesOrderHeaderINNER JOIN staging.customer using SalesOrderHeader(customerId)where OrderDate<=getdate()-30Ver SQL compilado
;WITHquery AS ( SELECT SalesOrderHeader.SalesOrderId AS SalesOrderId, Customer.CustomerId AS CustomerId, CAST(SalesOrderHeader.OrderDate AS date) AS OrderDate, SalesOrderHeader.SalesOrderNumber AS SalesOrderNumber, CAST(SalesOrderHeader.DueDate AS date) AS DueDate, CAST(SalesOrderHeader.ShipDate AS date) AS ShipDate, SalesOrderHeader.OnlineOrderFlag AS OnlineOrderFlag, SalesOrderHeader.PurchaseOrderNumber AS PurchaseOrderNumber, SalesOrderHeader.AccountNumber AS AccountNumber, SalesOrderHeader.Freight AS Freight, SalesOrderHeader.CreditCardApprovalCode AS CreditCardApprovalCode, SalesOrderHeader.SubTotal AS Amount, SalesOrderHeader.TaxAmt AS TaxAmt FROM staging.SalesOrderHeader INNER JOIN staging.customer ON (SalesOrderHeader.customerId=customer.customerId) WHERE CAST(SalesOrderHeader.OrderDate AS date)<=getdate()-30)INSERT dwh.FactSalesOrderHeader(SalesOrderId,CustomerId,OrderDate,SalesOrderNumber,DueDate,ShipDate,OnlineOrderFlag,PurchaseOrderNumber,AccountNumber,Freight,CreditCardApprovalCode,Amount,TaxAmt)SELECT query.SalesOrderId, query.CustomerId, query.OrderDate, query.SalesOrderNumber, query.DueDate, query.ShipDate, query.OnlineOrderFlag, query.PurchaseOrderNumber, query.AccountNumber, query.Freight, query.CreditCardApprovalCode, query.Amount, query.TaxAmtFROM queryLEFT JOIN dwh.FactSalesOrderHeader ON (FactSalesOrderHeader.SalesOrderId=query.SalesOrderId)WHERE FactSalesOrderHeader.SalesOrderId IS NULL;En el caso anterior se cargarían las ventas de los últimos 30 días que no se hayan cargado previamente. La marca # especifica la clave de inserción, es decir, que no se insertarán SalesOrderId que ya existan en la tabla de destino. Esta estrategia es recomendable si tenemos la seguridad de que:
- Ninguna venta tarda más de 30 días en cargarse en el sistema
- Los registros de venta una vez insertados en el origen no cambian nunca.
Otra opción sería añadir incrementalmente aquellos registros añadidos en el ERP desde la última carga (requiere un campo “timestamp” en el origen).
DECLARE @last timestamp=(select max(InsertTimestamp) from FactSalesOrderHeader)MERGE INCREMENTAL dwh.FactSalesOrderHeader(SalesOrderSid)SELECT SalesOrderHeader.SalesOrderId, Customer.CustomerId, cast(SalesOrderHeader.OrderDate as date) OrderDate, SalesOrderHeader.SalesOrderNumber, cast(SalesOrderHeader.DueDate as date) DueDate, cast(SalesOrderHeader.ShipDate as date) ShipDate, SalesOrderHeader.OnlineOrderFlag, SalesOrderHeader.PurchaseOrderNumber, SalesOrderHeader.AccountNumber, SalesOrderHeader.Freight, SalesOrderHeader.CreditCardApprovalCode, SalesOrderHeader.SubTotal Amount, SalesOrderHeader.TaxAmt, SalesOrderHeader.InsertTimestampFROM staging.SalesOrderHeaderINNER JOIN staging.customer using SalesOrderHeader(customerId)where SalesOrderHeader.InsertTimestamp>@lastVer SQL compilado
DECLARE @last timestamp=(SELECT max(InsertTimestamp) AS expr1FROM FactSalesOrderHeader);
;WITHquery AS ( SELECT SalesOrderHeader.SalesOrderId AS SalesOrderId, Customer.CustomerId AS CustomerId, CAST(SalesOrderHeader.OrderDate AS date) AS OrderDate, SalesOrderHeader.SalesOrderNumber AS SalesOrderNumber, CAST(SalesOrderHeader.DueDate AS date) AS DueDate, CAST(SalesOrderHeader.ShipDate AS date) AS ShipDate, SalesOrderHeader.OnlineOrderFlag AS OnlineOrderFlag, SalesOrderHeader.PurchaseOrderNumber AS PurchaseOrderNumber, SalesOrderHeader.AccountNumber AS AccountNumber, SalesOrderHeader.Freight AS Freight, SalesOrderHeader.CreditCardApprovalCode AS CreditCardApprovalCode, SalesOrderHeader.SubTotal AS Amount, SalesOrderHeader.TaxAmt AS TaxAmt, SalesOrderHeader.InsertTimestamp AS InsertTimestamp FROM staging.SalesOrderHeader INNER JOIN staging.customer ON (SalesOrderHeader.customerId=customer.customerId) WHERE SalesOrderHeader.InsertTimestamp>@last)INSERT dwh.FactSalesOrderHeader(SalesOrderId,CustomerId,OrderDate,SalesOrderNumber,DueDate,ShipDate,OnlineOrderFlag,PurchaseOrderNumber,AccountNumber,Freight,CreditCardApprovalCode,Amount,TaxAmt,InsertTimestamp)SELECT query.SalesOrderId, query.CustomerId, query.OrderDate, query.SalesOrderNumber, query.DueDate, query.ShipDate, query.OnlineOrderFlag, query.PurchaseOrderNumber, query.AccountNumber, query.Freight, query.CreditCardApprovalCode, query.Amount, query.TaxAmt, query.InsertTimestampFROM query;Observad como en este caso se ha declarado la variable @last con la fecha del último registro insertado.
En resumen, las cargas incrementales son adecuadas para aquellas situaciones en las que el origen tiene muchos millones de registros y solo se producen inserciones.
La instrucción MERGE INCREMENTAL es equivalente a la instrucción INSERT de Crono SQL.
MERGE HISTORY
Sección titulada «MERGE HISTORY»La instrucción MERGE HISTORY es una carga de dimensión lentamente cambiante tipo 2. Esta estrategia guarda la historia completa de los cambios utilizando los campos de fecha de inicio y fecha de fin vigencia. De esta manera, es posible conocer el contenido que tenía cualquier registro en una fecha dada. En esta estrategia de carga no se eliminan ni actualizan registros nunca. La instrucción MERGE HISTORY únicamente añade los registros que han cambiado desde la última carga (y actualiza los campos de vigencia que correspondan).
Se utiliza para cargar tablas de dimensión en las que es necesario guardar la historia de cambios.
La sintaxis es la misma que en los casos anteriores (y la mismas que la del INSERT o UPDATE…).Únicamente es necesario cambiar el nombre de la estrategia a utilizar: MERGE HISTORY
MERGE HISTORY dwh.DimProducts(ProductSid)select #ProductID, Product.Name Product, ProductCategory.name ProductCategory, ProductSubCategory.name ProductSubCategory, ProductNumber, ProductModel.name ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, WeightFROM staging.ProductLEFT JOIN staging.ProductSubCategory using Product(ProductSubcategoryID)LEFT JOIN staging.ProductCategory using ProductSubCategory(ProductCategoryId)LEFT JOIN staging.ProductModel using Product(ProductModelID)CHECK SNOWFLAKEVer SQL compilado
IF EXISTS ( SELECT count(*) FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID) HAVING count(*) <> (SELECT count(*) FROM staging.Product)) THROW 50001,'Las relaciones de esta consulta pierden o duplican registros de Product.',1
;WITHquery AS ( SELECT ProductID, Product.Name AS Product, ProductCategory.name AS ProductCategory, ProductSubCategory.name AS ProductSubCategory, ProductNumber, ProductModel.name AS ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, Weight FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID))MERGE dwh.DimProducts AS DimProductsUSING query ON query.ProductID=DimProducts.ProductIDWHEN MATCHED AND ((DimProducts.Product<>query.Product OR (DimProducts.Product IS NULL AND query.Product IS NOT NULL) OR (DimProducts.Product IS NOT NULL AND query.Product IS NULL) OR DimProducts.ProductCategory<>query.ProductCategory OR (DimProducts.ProductCategory IS NULL AND query.ProductCategory IS NOT NULL) OR (DimProducts.ProductCategory IS NOT NULL AND query.ProductCategory IS NULL) OR DimProducts.ProductSubCategory<>query.ProductSubCategory OR (DimProducts.ProductSubCategory IS NULL AND query.ProductSubCategory IS NOT NULL) OR (DimProducts.ProductSubCategory IS NOT NULL AND query.ProductSubCategory IS NULL) OR DimProducts.ProductNumber<>query.ProductNumber OR (DimProducts.ProductNumber IS NULL AND query.ProductNumber IS NOT NULL) OR (DimProducts.ProductNumber IS NOT NULL AND query.ProductNumber IS NULL) OR DimProducts.ProductModel<>query.ProductModel OR (DimProducts.ProductModel IS NULL AND query.ProductModel IS NOT NULL) OR (DimProducts.ProductModel IS NOT NULL AND query.ProductModel IS NULL) OR DimProducts.Color<>query.Color OR (DimProducts.Color IS NULL AND query.Color IS NOT NULL) OR (DimProducts.Color IS NOT NULL AND query.Color IS NULL) OR DimProducts.StandardCost<>query.StandardCost OR (DimProducts.StandardCost IS NULL AND query.StandardCost IS NOT NULL) OR (DimProducts.StandardCost IS NOT NULL AND query.StandardCost IS NULL) OR DimProducts.ListPrice<>query.ListPrice OR (DimProducts.ListPrice IS NULL AND query.ListPrice IS NOT NULL) OR (DimProducts.ListPrice IS NOT NULL AND query.ListPrice IS NULL) OR DimProducts.Size<>query.Size OR (DimProducts.Size IS NULL AND query.Size IS NOT NULL) OR (DimProducts.Size IS NOT NULL AND query.Size IS NULL) OR DimProducts.SizeUnitMeasureCode<>query.SizeUnitMeasureCode OR (DimProducts.SizeUnitMeasureCode IS NULL AND query.SizeUnitMeasureCode IS NOT NULL) OR (DimProducts.SizeUnitMeasureCode IS NOT NULL AND query.SizeUnitMeasureCode IS NULL) OR DimProducts.Weight<>query.Weight OR (DimProducts.Weight IS NULL AND query.Weight IS NOT NULL) OR (DimProducts.Weight IS NOT NULL AND query.Weight IS NULL))) THEN UPDATE SET;
;WITHquery AS ( SELECT ProductID, Product.Name AS Product, ProductCategory.name AS ProductCategory, ProductSubCategory.name AS ProductSubCategory, ProductNumber, ProductModel.name AS ProductModel, Color, StandardCost, ListPrice, Size, SizeUnitMeasureCode, Weight FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID))INSERT dwh.DimProducts(ProductID,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,StandardCost,ListPrice,Size,SizeUnitMeasureCode,Weight)SELECT query.ProductID, query.Product, query.ProductCategory, query.ProductSubCategory, query.ProductNumber, query.ProductModel, query.Color, query.StandardCost, query.ListPrice, query.Size, query.SizeUnitMeasureCode, query.WeightFROM queryLEFT JOIN dwh.DimProducts ON (DimProducts.ProductID=query.ProductID)WHERE DimProducts.ProductID IS NULL;El código generado realiza un MERGE para actualizar las fechas de fin vigencia, y un INSERT para añadir los registros que han cambiado y los nuevos registros.
También en este caso, Crono SQL genera automáticamente al código repetitivo:
- Creación de la tabla y los campos necesarios
- Creación de la clave subrogada
- Creación y carga de los campos de auditoria
- Comprobación de la corrección de los JOIN (ya que aparece la cláusula CHECK SNOWFLAKE).
::: tip RECUERDA Es interesante observar que la elección entre MERGE CLONE, MERGE UPSERT y MERGE HISTORY depende únicamente de las necesidades de negocio. Con Crono SQL cuesta exactamente lo mismo programar una carga SCD de tipo 1 o de tipo 2. :::
Sintaxis MERGE estándar
Sección titulada «Sintaxis MERGE estándar»El lenguaje Crono SQL admite también la sintaxis SQL estándar de la sentencia MERGE:
WITHquery AS ( SELECT ProductID AS ProductId, Product.Name AS Product, ProductCategory.name AS ProductCategory, ProductSubCategory.name AS ProductSubCategory, ProductNumber, ProductModel.name AS ProductModel, Product.Color AS Color, Product.StandardCost AS ProductCost FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID))MERGE dwh.DimProducts AS DimProductsUSING query ON query.ProductId=DimProducts.ProductIdWHEN MATCHED THEN UPDATE SET Product=query.Product, ProductCategory=query.ProductCategory, ProductSubCategory=query.ProductSubCategory, ProductNumber=query.ProductNumber, ProductModel=query.ProductModel, Color=query.Color, ProductCost=query.ProductCostWHEN NOT MATCHED THEN INSERT ( ProductId, Product, ProductCategory, ProductSubCategory, ProductNumber, ProductModel, Color,ProductCost) VALUES ( query.ProductId, query.Product, query.ProductCategory, query.ProductSubCategory, query.ProductNumber, query.ProductModel, query.Color, query.ProductCost)Ver SQL compilado
WITHquery AS ( SELECT ProductID AS ProductId, Product.Name AS Product, ProductCategory.name AS ProductCategory, ProductSubCategory.name AS ProductSubCategory, ProductNumber, ProductModel.name AS ProductModel, Product.Color AS Color, Product.StandardCost AS ProductCost FROM staging.Product LEFT JOIN staging.ProductSubCategory ON (Product.ProductSubcategoryID=ProductSubCategory.ProductSubcategoryID) LEFT JOIN staging.ProductCategory ON (ProductSubCategory.ProductCategoryId=ProductCategory.ProductCategoryId) LEFT JOIN staging.ProductModel ON (Product.ProductModelID=ProductModel.ProductModelID))MERGE dwh.DimProducts AS DimProductsUSING query ON query.ProductId=DimProducts.ProductIdWHEN MATCHED THEN UPDATE SET Product=query.Product, ProductCategory=query.ProductCategory, ProductSubCategory=query.ProductSubCategory, ProductNumber=query.ProductNumber, ProductModel=query.ProductModel, Color=query.Color, ProductCost=query.ProductCostWHEN NOT MATCHED THEN INSERT (ProductId,Product,ProductCategory,ProductSubCategory,ProductNumber,ProductModel,Color,ProductCost) VALUES ( query.ProductId, query.Product, query.ProductCategory, query.ProductSubCategory, query.ProductNumber, query.ProductModel, query.Color, query.ProductCost);La sentencia MERGE, en función de una serie de condiciones definidas, ejecutará un UPDATE de los registros que hayan cambiado y un INSERT de los registros de la consulta origen que no existan en la tabla destino. Por este motivo se le conoce como UPSERT (UPDATE+INSERT).
Debería ser la “sentencia estrella” de cualquier proyecto ETL/DWH, porque en estos proyectos lo que en definitiva se necesita es replicar los datos del origen en la tabla destino (¡exactamente lo que promete hacer el MERGE!). Lamentablemente, la sintaxis ISO es compleja, está llena de repeticiones, y es muy difícil escribir correctamente. Por este motivo, muchos desarrolladores la desconocen, o no la usan, y buscan métodos alternativos para realizar el trabajo (con mucho peor rendimiento, habitualmente).
En un proyecto Crono SQL nunca es necesario utilizar la sintaxis estándar, y se proponer utilizar siempre la sintaxis compacta de MERGE CLONE, MERGE UPSERT, etc.
Resumen
Sección titulada «Resumen»La sentencia MERGE permite cargar una tabla aplicando alguna de las estrategias de carga disponibles. Las más habituales son:
- MERGE CLONE: Actualización completa
- MERGE UPSERT (o simplemente MERGE): Dimensión lentamente cambiante tipo 1
- MERGE INCREMENTAL: Carga incremental
- MERGE HISTORY: Dimensión lentamente cambiante tipo 2
El uso de un conjunto cerrado y reducido de estrategias aporta muchos beneficios:
- El uso de estrategias asegura un código homogéneo y fácil de mantener, y permite que el desarrollador se centre en aquellas actividades que realmente aportan valor.
- Una característica importante del lenguaje es que al programador le costará exactamente lo mismo aplicar cualquier estrategia de carga. Es decir, la decisión de utilizar una u otra no se verá influenciada por la dificultad de programarlo, por lo que la decisión se tomará en función de las necesidades del negocio.
- Además, el código SQL generado está altamente optimizado para obtener siempre el máximo rendimiento. Crono SQL no actua como una caja negra sino que delega totalmente el trabajo a quien mejor sabe hacerlo (el motor de la base de datos).
- Crono SQL genera automáticamente todo el código repetitivo. No hemos de preocuparnos ni de crear la tabla. Todas las tablas tendrán campos de auditoría correctamente informados. Ni siquiera hemos de escribir el farragoso código necesario para ejecutar un MERGE, un UPDATE, o un INSERT.