Hi - I'm trying to come up with an efficient way to compare datasets and pinpoint their differences. Ideally, a generic method that eliminates the need to explicitly state column names so it can be used for any dataset.
Here is what I have so far...
Putting the results of the baseline query into a global temp table, called ##Old, and the results of the revised query into a global temp table, called ##New, and then running this code isolates the rows from the two datasets where differences exist:
select source = 'old', * from ##Old except select source = 'old', * from ##New union all select source = 'new', * from ##New except select source = 'new', * from ##Old;
It's a good start; however, when there are thousands of rows of exceptions and several columns, it could use some refinement to nail down exactly which columns are the ones causing the exceptions for each key combination.
Here is some sample code to work with for this problem:
declare @Exceptions table ( Source varchar(3), PlantID int, OrderID int, ItemID int, DateOrdered datetime, DateProduced datetime, QuantityOrdered decimal(12,2), QuantityProduced decimal(12,2), QuantityScrapped decimal(12,2), InventoryCode varchar(10), PRIMARY KEY CLUSTERED (Source, PlantID, OrderID) ); insert into @Exceptions (Source, PlantID, OrderID, ItemID, DateOrdered, DateProduced, QuantityOrdered, QuantityProduced, QuantityScrapped, InventoryCode) values ('new', 1217, 560, 123, '2013-08-09 14:35:29.123', '2013-08-10 14:28:17.456', 958.5, 921.7, 2.6, 'A'), ('old', 1217, 560, 123, '2013-08-09 13:35:29.123', '2013-08-10 14:28:17.456', 958.5, 921.7, 7.6, 'A'), --DateOrdered, QuantityScrapped are different ('new', 1218, 560, 456, '2013-08-16 15:28:30.000', '2013-08-17 2:46:15.000', 764.4, 778.3, 0.0, 'B'), ('old', 1218, 560, 456, '2013-08-16 15:28:30.000', '2013-08-17 1:40:10.444', 760.0, 778.3, 0.0, 'BC'), --DateProduced, QuantityOrdered, InventoryCode are different ('new', 1217, 561, 456, '2013-08-20 11:16:14.165', '2013-08-20 22:33:22.000', 844.7, 890.7, 1.8, 'C'), ('old', 1217, 561, 456, '2013-08-20 11:16:14.165', '2013-08-20 22:33:22.000', 840.7, 956.0, 1.8,'A'); --QuantityOrdered, QuantityProduced, InventoryCode are different select * from @Exceptions order by PlantID, OrderID, Source; go
Here is a picture of the desired output:
Note that any non-string number displays a value in the Difference column while strings just display NULL in the Difference column.
It may be possible to use UNPIVOT for this problem, though having columns with different data types presents a challenge.
Please keep in mind that the solution must be able to support all data types and it must not require explicitly stating column names.
Any ideas about how to go about solving this would be greatly appreciated.