sql - How to synthesize attribute for joined tables -


i have view defined this:

create view [dbo].[possiblymatchingcontracts] select      c.uniqueid,     cc.uniqueid possiblymatchingcontracts  [dbo].allcontracts c     inner join [dbo].allcontracts cc         on c.secondarymatchcodefb = cc.secondarymatchcodefb             or c.secondarymatchcodelb = cc.secondarymatchcodelb             or c.secondarymatchcodebb = cc.secondarymatchcodebb             or c.secondarymatchcodelb = cc.secondarymatchcodebb             or c.secondarymatchcodebb = cc.secondarymatchcodelb c.uniqueid not in     (         select uniqueid [dbo].definitivematches     )     , c.associateduser null     , c.uniqueid <> cc.uniqueid 

which finding contracts f.e. first name , birthday matching. works great. want add synthetic attribute each row value 1 source row.

let me give example make clearer. suppose have following table:

uniqueid  | firstname | lastname  | birthday  1         | peter     | smith     | 1980-11-04 2         | peter     | gray      | 1980-11-04 3         | peter     | gray-smith| 1980-11-04 4         | frank     | may       | 1985-06-09 5         | frank-paul| may       | 1985-06-09 6         | gina      | ericson   | 1950-11-04 

the resulting view should this:

uniqueid | possiblymatchingcontracts | syntheticid  1        | 2                         | petersmith1980-11-04 1        | 3                         | petersmith1980-11-04 2        | 1                         | petersmith1980-11-04 2        | 3                         | petersmith1980-11-04 3        | 1                         | petersmith1980-11-04 3        | 2                         | petersmith1980-11-04 4        | 5                         | frankmay1985-06-09 5        | 4                         | frankmay1985-06-09 6        | null                      | null [or] ginaericson1950-11-04 

notice syntheticid column uses values 1 of matching source rows. doesn't matter one. exporting view application , need able identify each "match group" afterwards.

is clear mean? ideas how done in sql?

maybe helps elaborate bit on actual use case:

i importing contracts different systems. account possibility of typos or people have married last name updated in 1 system, need find called 'possible matches'. 2 or more contracts considered possible match if contain same birthday plus same first, last or birth name. implies, if contract matches contract b, contract b matches contract a.

the target system uses multivalue reference attributes store these relationships. ultimate goal create user objects these contracts. catch first is, shall 1 user object multiple matching contracts. i'm creating these matches in view. second catch is, creation of user objects happens workflows, run parallel each contract. avoid creating multiple user objects matching contracts, each workflow needs check, if there matching user object or workflow, create said user object. because workflow engine extremely slow compared sql, workflows should not repeat whole matching test. idea is, let workflow check 'syntheticid'.

i have solved multi step approach:

  1. create list of possible 1st level matches each contract
  2. create base groups list, assigning different group for each contract (as if not related anybody)
  3. iterate matches list updating group list when more contracts need added group
  4. recursively build syntheticid final group list
  5. output results

first of all, let me explain have understood, can tell if approach correct or not.

1) matching propagates in "cascade"

i mean, if "peter smith" grouped "peter gray", means smith , gray related (if have same birth date) luke smith can in same group of john gray

2) have not understood mean "birth name"

you contracts matches on "first, last or birth name", sorry, i'm italian, thought birth name , first same, in data there not such column. maybe related dash symbol between names?
when firstname frank-paul means should match both frank , paul?
when lastname gray-smith means should match both gray , smith?

in following code have ignored problem, handled if needed (i did try, breaking names, unpivoting them , treating double match).

step zero: declaration , prepare base data

declare @cli table (uniqueid int primary key, firstname varchar(20), lastname varchar(20), birthday varchar(20)) declare @comb table (id1 int, id2 int, done bit) declare @grp table (ix int identity primary key, grp int, id int, unique (grp,ix)) declare @str_id table (grp int primary key, syntheticid varchar(1000)) declare @id1 int, @g int  ;with t (     select *     (values     (1         , 'peter'     , 'smith'     , '1980-11-04'),     (2         , 'peter'     , 'gray'      , '1980-11-04'),     (3         , 'peter'     , 'gray-smith', '1980-11-04'),     (4         , 'frank'     , 'may'       , '1985-06-09'),     (5         , 'frank-paul', 'may'       , '1985-06-09'),     (6         , 'gina'      , 'ericson'   , '1950-11-04')     ) x (uniqueid  , firstname , lastname  , birthday) ) insert @cli select * t 

step one: create list of possible 1st level matches each contract

;with p as(select uniqueid, birthday, firstname, lastname @cli), m (     select p.uniqueid uniqueid1, p.firstname firstname1, p.lastname lastname1, p.birthday birthday1, pp.uniqueid uniqueid2, pp.firstname firstname2, pp.lastname lastname2, pp.birthday birthday2     p     join p pp on (pp.birthday=p.birthday) , (pp.firstname = p.firstname or pp.lastname = p.lastname)     p.uniqueid<=pp.uniqueid ) insert @comb select uniqueid1,uniqueid2,0 m 

step two: create base groups list

insert @grp select row_number() over(order id1), id1 @comb id1=id2 

step three: iterate matches list updating group list loop on contracts have possible matches , updates if needed

set @id1 = 0 while not(@id1 null) begin     set @id1 = (select top 1 id1 @comb id1<>id2 , done=0)      if not(@id1 null) begin          set @g = (select grp @grp id=@id1)         update g set grp= @g         @grp g         inner join @comb c on g.id = c.id2         c.id2<>@id1 , c.id1=@id1         , grp<>@g          update @comb set done=1 id1=@id1     end end 

step four: build syntheticid recursively add (distinct) first , last names of group syntheticid.
used '_' separator birth date, first names , last names, , ',' separator list of names avoid conflicts.

;with c as(     select c.*, g.grp     @cli c     join @grp g on g.id = c.uniqueid ), d (     select *, row_number() on (partition g order t,s) n1, row_number() on (partition g order t desc,s desc) n2     (         select distinct c.grp g, 1 t, firstname s c         union          select distinct c.grp, 2, lastname c          ) l ), r (     select d.*, cast(convert(varchar(10), t.birthday, 112) + '_' + s varchar(1000)) names, cast(0 bigint) i1, cast(0 bigint) i2     d     join @cli t on t.uniqueid=d.g     n1=1     union     select d.*, cast(r.names + iif(r.t<>d.t,'_',',') +  d.s varchar(1000)), r.n1, r.n2     d     join r on r.g = d.g , r.n1=d.n1-1  ) insert @str_id  select g, names r n2=1 

step five: output results

select c.uniqueid, case when id2=uniqueid id1 else id2 end possiblematchingcontract, s.syntheticid @cli c left join @comb cb on c.uniqueid in(id1,id2) , id1<>id2 left join @grp g on c.uniqueid = g.id left join @str_id s on s.grp = g.grp 

here results

uniqueid    possiblematchingcontract    syntheticid 1           2                           1980-11-04_peter_gray,gray-smith,smith 1           3                           1980-11-04_peter_gray,gray-smith,smith 2           1                           1980-11-04_peter_gray,gray-smith,smith 2           3                           1980-11-04_peter_gray,gray-smith,smith 3           1                           1980-11-04_peter_gray,gray-smith,smith 3           2                           1980-11-04_peter_gray,gray-smith,smith 4           5                           1985-06-09_frank,frank-paul_may 5           4                           1985-06-09_frank,frank-paul_may 6           null                        1950-11-04_gina_ericson 

i think in way resulting syntheticid should "unique" each group


Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -