I have an execution plan in which an index scan is being performed on a non clustered index, this is a parallel operations and the rows are being split across six threads, as the data is distributed between threads at page level granularity I expect the number of rows per thread to be skewed. The next operator along is a compute scaler, followed by a repartition streams operator, I expect the number of rows per thread coming out of the repartition streams to be more evenly balanced across threads than when the data came in, but it is not if anything its worse than the number of rows per thread coming out of the clustered index scan.
What am I missing here ?