cluster analysis - Java clustering algorithm to handle both similarity and dissimilarity -
i'm working on java project need match user queries against several engines. each engine has method similarity(object a, object b) returns: +1 if objects surely match; -1 if objects surely don't match; float in-between when there's uncertainty.
example: user searches "dragon ball".
- engine 1 returns "dragon ball", "dragon ball gt", "dragon ball z", , claims different result (similarity=-1), no matter how similar names look. engine accurate, has high "weight" value.
- engine 2 returns 100 different results. of them relate dbz, others dbgt, etc. engine claims they're "quite similar" (similarity between 0.5 , 1).
- the system queries several other engines (10+)
i'm looking way build clusters out of system. need ensure values similarity near -1 end in different clusters, if many other values similar of them.
is there well-known clustering algorithm solve problem? there java implementation available? can build on own, perhaps of support library? i'm @ java (15+ years experience) i'm new @ clustering.
thank you!
the obvious approach use "1 - similarity" distance function, go 0 2. add them up.
or use 1 + similarity
, take product of these values, ... or, or, or, ...
but since apparently trust first score more, may want increase influence. there no mathematical solution this, habe choose weights depending on data , preferences. if have training data, can optimize weights approach, , may want discard rankers if don't work or correlated.
Comments
Post a Comment