cluster analysis - Java clustering algorithm to handle both similarity and dissimilarity -

August 15, 2013

i'm working on java project need match user queries against several engines. each engine has method similarity(object a, object b) returns: +1 if objects surely match; -1 if objects surely don't match; float in-between when there's uncertainty.

example: user searches "dragon ball".

engine 1 returns "dragon ball", "dragon ball gt", "dragon ball z", , claims different result (similarity=-1), no matter how similar names look. engine accurate, has high "weight" value.
engine 2 returns 100 different results. of them relate dbz, others dbgt, etc. engine claims they're "quite similar" (similarity between 0.5 , 1).
the system queries several other engines (10+)

i'm looking way build clusters out of system. need ensure values similarity near -1 end in different clusters, if many other values similar of them.

is there well-known clustering algorithm solve problem? there java implementation available? can build on own, perhaps of support library? i'm @ java (15+ years experience) i'm new @ clustering.

thank you!

the obvious approach use "1 - similarity" distance function, go 0 2. add them up.

or use 1 + similarity , take product of these values, ... or, or, or, ...

but since apparently trust first score more, may want increase influence. there no mathematical solution this, habe choose weights depending on data , preferences. if have training data, can optimize weights approach, , may want discard rankers if don't work or correlated.

Search This Blog

CSS

cluster analysis - Java clustering algorithm to handle both similarity and dissimilarity -

Comments

Post a Comment

Popular posts from this blog

php - trouble displaying mysqli database results in correct order -

depending on nth recurrence of job in control M -

sql server - Cannot query correctly (MSSQL - PHP - JSON) -