I have a requirement to get the best match between the columns of two tables, so for example:
Table A
ID | C1 | C2 | C3 | C4 | C5 | C6 |
1 | A | B | ||||
2 | A | D | C | |||
3 | A | |||||
4 | D | |||||
5 | B | |||||
6 | B | C |
Table B
ID | C1 | C2 | C3 | C4 | C5 | C6 | Match |
1 | A | B | B | B | D | C | 1, 3, 5 |
2 | A | C | B | C | C | C | 1, 3, 5, 6 |
3 | B | B | B | C | C | C | 5 |
4 | E | D | B | F | F | E | 4, 5 |
A row matches when any of the columns are equalor null. The match column in Table B shows which rows of Table A that row should match. Once I know which columns match I will then be able to determine the best match by which row matches most columns. The following query seems to do the job of identifying which rows match:
SELECT TableA.ID, TableB.ID FROM TableA LEFT OUTER JOIN TableB ON (TableB.C1 = TableA.C1 OR TableA.C1 IS NULL) AND (TableB.C2 = TableA.C2 OR TableA.C2 IS NULL) AND (TableB.C3 = TableA.C3 OR TableA.C3 IS NULL) AND (TableB.C4 = TableA.C4 OR TableA.C4 IS NULL) AND (TableB.C5 = TableA.C5 OR TableA.C5 IS NULL) AND (TableB.C6 = TableA.C6 OR TableA.C6 IS NULL) WHERE NOT (TableA.C1 IS NULL AND TableA.C2 IS NULL AND TableA.C3 IS NULL AND TableA.C4 IS NULL AND TableA.C5 IS NULL AND TableA.C6 IS NULL)
However I am finding that when both these table have around 400K records the query just takes far too long to run. How can I optimise this query? Are there any indexes which would help or is there a better way to approach this problem?
Thanks in advance for any help,
Graham.