performance - Postgres - Single aggregation done with sequential scan, many aggregations index scan -
i want query table , sum column of rows last day of month.
let's use following table example:
create table example(dt date, value int, other1 int, other2 int, other3 int); create index on example (as_of);
my query looks this:
select dt, sum(value) example dt in (select date_trunc('month', d) + interval '1 month - 1 day' generate_series('2012-01-01'::date, '2016-11-10'::date, interval '1 month') dates(d)) group dt
if @ query plan see it's performing sequential scan against table:
explain analyse select dt, sum(value) example dt in (select date_trunc('month', d) + interval '1 month - 1 day' generate_series('2012-01-01'::date, '2016-11-10'::date, interval '1 month') dates(d)) group dt groupaggregate (cost=825385.12..871490.30 rows=1536 width=12) (actual time=4323.887..6141.401 rows=56 loops=1) group key: example.dt -> merge join (cost=825385.12..863846.28 rows=1525732 width=12) (actual time=4323.811..6118.514 rows=101102 loops=1) merge cond: (example.dt = ((date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval))) -> sort (cost=825312.64..832941.30 rows=3051464 width=12) (actual time=4323.585..5303.902 rows=3051464 loops=1) sort key: example.dt sort method: external merge disk: 77512kb -> seq scan on example (cost=0.00..392353.64 rows=3051464 width=12) (actual time=10.385..1748.592 rows=3051464 loops=1) -> sort (cost=72.48..72.98 rows=200 width=8) (actual time=0.168..18.248 rows=101105 loops=1) sort key: ((date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval)) sort method: quicksort memory: 27kb -> unique (cost=59.84..64.84 rows=200 width=8) (actual time=0.108..0.143 rows=59 loops=1) -> sort (cost=59.84..62.34 rows=1000 width=8) (actual time=0.106..0.112 rows=59 loops=1) sort key: ((date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval)) sort method: quicksort memory: 27kb -> function scan on generate_series dates (cost=0.01..10.01 rows=1000 width=8) (actual time=0.042..0.097 rows=59 loops=1)
however, if add additional sums query decides use index on dt
:
explain analyse select dt, sum(value), sum(other1), sum(other2), sum(other3) example dt in (select date_trunc('month', d) + interval '1 month - 1 day' generate_series('2012-01-01'::date, '2016-11-10'::date, interval '1 month') dates(d)) group dt hashaggregate (cost=1005765.17..1005780.53 rows=1536 width=61) (actual time=225.249..225.276 rows=56 loops=1) group key: l.as_of -> nested loop (cost=60.27..975250.53 rows=1525732 width=61) (actual time=0.141..173.853 rows=101102 loops=1) -> unique (cost=59.84..64.84 rows=200 width=8) (actual time=0.100..0.192 rows=59 loops=1) -> sort (cost=59.84..62.34 rows=1000 width=8) (actual time=0.099..0.125 rows=59 loops=1) sort key: ((date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval)) sort method: quicksort memory: 27kb -> function scan on generate_series dates (cost=0.01..10.01 rows=1000 width=8) (actual time=0.031..0.080 rows=59 loops=1) -> index scan using dashboard_loanhistory_95daa586 on dashboard_loanhistory l (cost=0.43..4856.06 rows=1987 width=61) (actual time=0.025..1.579 rows=1714 loops=59) index cond: (as_of = (date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval)) planning time: 0.228 ms execution time: 225.379 ms
what's going on here? want original query run using index on dt
, rather not add additional aggregations query unnecessarily.
this based on comments question, particularly @joops answer. it's of hack requires index - don't understand why query planner won't use 1 on dt
here works ¯\_(ツ)_/¯
i added partial index on dt
column, restricting days possibly last day of month:
create index on example (dt) date_part('day', dt) in (28, 29, 30, 31);
then altered query include predicate on day of month being in range:
select dt, sum(value) example date_part('day', dt) in (28, 29, 30, 31) , dt in (select date_trunc('month', d) + interval '1 month - 1 day' generate_series('2012-01-01'::date, '2016-11-10'::date, interval '1 month') dates(d));
Comments
Post a Comment