sql - How to synthesize attribute for joined tables -
i have view defined this:
create view [dbo].[possiblymatchingcontracts] select c.uniqueid, cc.uniqueid possiblymatchingcontracts [dbo].allcontracts c inner join [dbo].allcontracts cc on c.secondarymatchcodefb = cc.secondarymatchcodefb or c.secondarymatchcodelb = cc.secondarymatchcodelb or c.secondarymatchcodebb = cc.secondarymatchcodebb or c.secondarymatchcodelb = cc.secondarymatchcodebb or c.secondarymatchcodebb = cc.secondarymatchcodelb c.uniqueid not in ( select uniqueid [dbo].definitivematches ) , c.associateduser null , c.uniqueid <> cc.uniqueid
which finding contracts f.e. first name , birthday matching. works great. want add synthetic attribute each row value 1 source row.
let me give example make clearer. suppose have following table:
uniqueid | firstname | lastname | birthday 1 | peter | smith | 1980-11-04 2 | peter | gray | 1980-11-04 3 | peter | gray-smith| 1980-11-04 4 | frank | may | 1985-06-09 5 | frank-paul| may | 1985-06-09 6 | gina | ericson | 1950-11-04
the resulting view should this:
uniqueid | possiblymatchingcontracts | syntheticid 1 | 2 | petersmith1980-11-04 1 | 3 | petersmith1980-11-04 2 | 1 | petersmith1980-11-04 2 | 3 | petersmith1980-11-04 3 | 1 | petersmith1980-11-04 3 | 2 | petersmith1980-11-04 4 | 5 | frankmay1985-06-09 5 | 4 | frankmay1985-06-09 6 | null | null [or] ginaericson1950-11-04
notice syntheticid column uses values 1 of matching source rows. doesn't matter one. exporting view application , need able identify each "match group" afterwards.
is clear mean? ideas how done in sql?
maybe helps elaborate bit on actual use case:
i importing contracts different systems. account possibility of typos or people have married last name updated in 1 system, need find called 'possible matches'. 2 or more contracts considered possible match if contain same birthday plus same first, last or birth name. implies, if contract matches contract b, contract b matches contract a.
the target system uses multivalue reference attributes store these relationships. ultimate goal create user objects these contracts. catch first is, shall 1 user object multiple matching contracts. i'm creating these matches in view. second catch is, creation of user objects happens workflows, run parallel each contract. avoid creating multiple user objects matching contracts, each workflow needs check, if there matching user object or workflow, create said user object. because workflow engine extremely slow compared sql, workflows should not repeat whole matching test. idea is, let workflow check 'syntheticid'.
i have solved multi step approach:
- create list of possible 1st level matches each contract
- create base groups list, assigning different group for each contract (as if not related anybody)
- iterate matches list updating group list when more contracts need added group
- recursively build syntheticid final group list
- output results
first of all, let me explain have understood, can tell if approach correct or not.
1) matching propagates in "cascade"
i mean, if "peter smith" grouped "peter gray", means smith , gray related (if have same birth date) luke smith can in same group of john gray
2) have not understood mean "birth name"
you contracts matches on "first, last or birth name", sorry, i'm italian, thought birth name , first same, in data there not such column. maybe related dash symbol between names?
when firstname frank-paul means should match both frank , paul?
when lastname gray-smith means should match both gray , smith?
in following code have ignored problem, handled if needed (i did try, breaking names, unpivoting them , treating double match).
step zero: declaration , prepare base data
declare @cli table (uniqueid int primary key, firstname varchar(20), lastname varchar(20), birthday varchar(20)) declare @comb table (id1 int, id2 int, done bit) declare @grp table (ix int identity primary key, grp int, id int, unique (grp,ix)) declare @str_id table (grp int primary key, syntheticid varchar(1000)) declare @id1 int, @g int ;with t ( select * (values (1 , 'peter' , 'smith' , '1980-11-04'), (2 , 'peter' , 'gray' , '1980-11-04'), (3 , 'peter' , 'gray-smith', '1980-11-04'), (4 , 'frank' , 'may' , '1985-06-09'), (5 , 'frank-paul', 'may' , '1985-06-09'), (6 , 'gina' , 'ericson' , '1950-11-04') ) x (uniqueid , firstname , lastname , birthday) ) insert @cli select * t
step one: create list of possible 1st level matches each contract
;with p as(select uniqueid, birthday, firstname, lastname @cli), m ( select p.uniqueid uniqueid1, p.firstname firstname1, p.lastname lastname1, p.birthday birthday1, pp.uniqueid uniqueid2, pp.firstname firstname2, pp.lastname lastname2, pp.birthday birthday2 p join p pp on (pp.birthday=p.birthday) , (pp.firstname = p.firstname or pp.lastname = p.lastname) p.uniqueid<=pp.uniqueid ) insert @comb select uniqueid1,uniqueid2,0 m
step two: create base groups list
insert @grp select row_number() over(order id1), id1 @comb id1=id2
step three: iterate matches list updating group list loop on contracts have possible matches , updates if needed
set @id1 = 0 while not(@id1 null) begin set @id1 = (select top 1 id1 @comb id1<>id2 , done=0) if not(@id1 null) begin set @g = (select grp @grp id=@id1) update g set grp= @g @grp g inner join @comb c on g.id = c.id2 c.id2<>@id1 , c.id1=@id1 , grp<>@g update @comb set done=1 id1=@id1 end end
step four: build syntheticid recursively add (distinct) first , last names of group syntheticid.
used '_' separator birth date, first names , last names, , ',' separator list of names avoid conflicts.
;with c as( select c.*, g.grp @cli c join @grp g on g.id = c.uniqueid ), d ( select *, row_number() on (partition g order t,s) n1, row_number() on (partition g order t desc,s desc) n2 ( select distinct c.grp g, 1 t, firstname s c union select distinct c.grp, 2, lastname c ) l ), r ( select d.*, cast(convert(varchar(10), t.birthday, 112) + '_' + s varchar(1000)) names, cast(0 bigint) i1, cast(0 bigint) i2 d join @cli t on t.uniqueid=d.g n1=1 union select d.*, cast(r.names + iif(r.t<>d.t,'_',',') + d.s varchar(1000)), r.n1, r.n2 d join r on r.g = d.g , r.n1=d.n1-1 ) insert @str_id select g, names r n2=1
step five: output results
select c.uniqueid, case when id2=uniqueid id1 else id2 end possiblematchingcontract, s.syntheticid @cli c left join @comb cb on c.uniqueid in(id1,id2) , id1<>id2 left join @grp g on c.uniqueid = g.id left join @str_id s on s.grp = g.grp
here results
uniqueid possiblematchingcontract syntheticid 1 2 1980-11-04_peter_gray,gray-smith,smith 1 3 1980-11-04_peter_gray,gray-smith,smith 2 1 1980-11-04_peter_gray,gray-smith,smith 2 3 1980-11-04_peter_gray,gray-smith,smith 3 1 1980-11-04_peter_gray,gray-smith,smith 3 2 1980-11-04_peter_gray,gray-smith,smith 4 5 1985-06-09_frank,frank-paul_may 5 4 1985-06-09_frank,frank-paul_may 6 null 1950-11-04_gina_ericson
i think in way resulting syntheticid should "unique" each group
Comments
Post a Comment