Uncommonly common recommendation on Neo4j with Cypher -


i trying implement basic recommendation system on neo4j. basically, have users , artists liked users. query "users liked damien rice, liked these artists". that's easy following:

match (n:artist)<-[:likes]-(p:person)-[:likes]->(n2:artist {artist_name: "damien rice"}) return n.artist_name, count(n) count order count desc limit 30 

although approach kind of true, returns coldplay, beatles (users popular everyone) follows:

n.artist_name        count coldplay             6193 radiohead            5377 beatles          3998 death cab cutie  3647 muse                 3252 killers          3064 jack johnson         2966 

i tend figure out uncommonly common suggestions. intended approach give score coldplay calculating (6193/totalnumberoflikesforcoldplay). example, if total of 61930 people liked coldplay, has score 9163/91630 = 0.1 , want sort artists depending on score.

i tried following:

match (n:artist)<-[:likes]-(p:person)-[:likes]->(n2:artist {artist_name: "damien rice"}) match (n2:artist {artist_name: "damien rice"})<-[:likes]-(p2:person) return n.artist_name, count(n)/count(n2) score order score desc limit 30 

but tooks forever. kind of query should type result in efficient way?

edit: realized query tried above not want. calculates numberofpeoplebothlikedcoldplay_damienrice/numberofpeoplelikeddamienrice numberofpeoplebothlikedthebeatles_damienrice/numberofpeoplelikeddamienrice , on

however want calculate numberofpeoplebothlikedcoldplay_damienrice/numberofpeoplelikedcoldplay numberofpeoplebothlikedthebeatles_damienrice/numberofpeoplelikedthebeatles ...

so maybe can updated

match (n:artist)<-[:likes]-(p:person)-[:likes]->(n2:artist {artist_name: "damien rice"}) match (n2:artist {artist_name: n.name})<-[:likes]-(p2:person) return n.artist_name, count(p)/count(p2) score order score desc limit 30 

but now, returns me "(no rows)" result.

edit2: suggested, updated query follows:

match (p2:person)-[:likes]->(n:artist)<-[:likes]-(p:person)-[:likes]->   (n2:artist {artist_name: "damien rice"}) return n.artist_name, count(p)/count(p2) score order score desc limit 30 

but still runs forever. way, have 292516 artists, 359347 people, 17549962 likes relations between artist , people. , can assume :person can :artist once, , :persons can :artists

there improvements can make here.

it's helpful understand why query may taking long. recall neo4j returns amounts rows of columns of data, , built progress through query. after second match, being built rows consisting of n2, , every combination of person likes n2 every person likes n2 (since second match creates cartesian product on same set of persons) every other artist liked these people. it's highly inefficient query (n^2 @ least in complexity), , long or never-finishing execution time expected.

so let's fix up.

first, can rid of second match entirely calculating number of likes n2. instead (assuming :person can :artist once, , :persons can :artists) can count number of :likes relationships directly. reordering first, ensure operation happens once single row of data rather being duplicated great number of rows. can run first match.

match (n2:artist {artist_name: "damien rice"}) n2, size( (n2)<-[:likes]-() ) n2likes match (n:artist)<-[:likes]-()-[:likes]->(n2) n, tofloat(count(n))/n2likes score order score desc limit 30 return n.artist_name, score 

edit address clarified requirements. also, altered queries use float values count, resulting score decimal rather int.

we can use similar approach of getting size() of likes of each artist.

match (n:artist)<-[:likes]-()-[:likes]->(n2:artist {artist_name: "damien rice"}) n, tofloat(count(n)) likesbothcnt n, likesbothcnt, size( ()-[:likes]->(n) ) likesartist n, likesbothcnt/likesartist score order score desc limit 30 return n.artist_name, score 

however query slower first 1 proposed. 1 way improve speed caching snapshot of count per artist on artist node ahead of time, using cached value when need real-time calculation. need figure out how , when update cached values, though.


Comments

Popular posts from this blog

sql server - Cannot query correctly (MSSQL - PHP - JSON) -

php - trouble displaying mysqli database results in correct order -

C++ Linked List -