performance - Postgres - Single aggregation done with sequential scan, many aggregations index scan -


i want query table , sum column of rows last day of month.

let's use following table example:

create table example(dt date, value int, other1 int, other2 int, other3 int); create index on example (as_of); 

my query looks this:

select dt, sum(value) example dt in (select date_trunc('month', d) + interval '1 month - 1 day'              generate_series('2012-01-01'::date, '2016-11-10'::date, interval '1 month') dates(d)) group dt 

if @ query plan see it's performing sequential scan against table:

explain analyse select dt, sum(value) example dt in (select date_trunc('month', d) + interval '1 month - 1 day'              generate_series('2012-01-01'::date, '2016-11-10'::date, interval '1 month') dates(d)) group dt   groupaggregate  (cost=825385.12..871490.30 rows=1536 width=12) (actual time=4323.887..6141.401 rows=56 loops=1)    group key: example.dt    ->  merge join  (cost=825385.12..863846.28 rows=1525732 width=12) (actual time=4323.811..6118.514 rows=101102 loops=1)          merge cond: (example.dt = ((date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval)))          ->  sort  (cost=825312.64..832941.30 rows=3051464 width=12) (actual time=4323.585..5303.902 rows=3051464 loops=1)                sort key: example.dt                sort method: external merge  disk: 77512kb                ->  seq scan on example  (cost=0.00..392353.64 rows=3051464 width=12) (actual time=10.385..1748.592 rows=3051464 loops=1)          ->  sort  (cost=72.48..72.98 rows=200 width=8) (actual time=0.168..18.248 rows=101105 loops=1)                sort key: ((date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval))                sort method: quicksort  memory: 27kb                ->  unique  (cost=59.84..64.84 rows=200 width=8) (actual time=0.108..0.143 rows=59 loops=1)                      ->  sort  (cost=59.84..62.34 rows=1000 width=8) (actual time=0.106..0.112 rows=59 loops=1)                            sort key: ((date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval))                            sort method: quicksort  memory: 27kb                            ->  function scan on generate_series dates  (cost=0.01..10.01 rows=1000 width=8) (actual time=0.042..0.097 rows=59 loops=1) 

however, if add additional sums query decides use index on dt:

explain analyse select dt, sum(value), sum(other1), sum(other2), sum(other3) example dt in (select date_trunc('month', d) + interval '1 month - 1 day'              generate_series('2012-01-01'::date, '2016-11-10'::date, interval '1 month') dates(d)) group dt  hashaggregate  (cost=1005765.17..1005780.53 rows=1536 width=61) (actual time=225.249..225.276 rows=56 loops=1)    group key: l.as_of    ->  nested loop  (cost=60.27..975250.53 rows=1525732 width=61) (actual time=0.141..173.853 rows=101102 loops=1)          ->  unique  (cost=59.84..64.84 rows=200 width=8) (actual time=0.100..0.192 rows=59 loops=1)                ->  sort  (cost=59.84..62.34 rows=1000 width=8) (actual time=0.099..0.125 rows=59 loops=1)                      sort key: ((date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval))                      sort method: quicksort  memory: 27kb                      ->  function scan on generate_series dates  (cost=0.01..10.01 rows=1000 width=8) (actual time=0.031..0.080 rows=59 loops=1)          ->  index scan using dashboard_loanhistory_95daa586 on dashboard_loanhistory l  (cost=0.43..4856.06 rows=1987 width=61) (actual time=0.025..1.579 rows=1714 loops=59)                index cond: (as_of = (date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval))  planning time: 0.228 ms  execution time: 225.379 ms 

what's going on here? want original query run using index on dt, rather not add additional aggregations query unnecessarily.

this based on comments question, particularly @joops answer. it's of hack requires index - don't understand why query planner won't use 1 on dt here works ¯\_(ツ)_/¯

i added partial index on dt column, restricting days possibly last day of month:

create index on example (dt) date_part('day', dt) in (28, 29, 30, 31); 

then altered query include predicate on day of month being in range:

select dt, sum(value) example date_part('day', dt) in (28, 29, 30, 31) ,       dt in (select date_trunc('month', d) + interval '1 month - 1 day' generate_series('2012-01-01'::date, '2016-11-10'::date, interval '1 month') dates(d)); 

Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -