sql - Hardcoding function parameter yields 5x speed up -
i have following stored procedure generate dynamic query.
given list of conditions/filters, finds visitors
belong given app
. app_id
passed in argument.
if call function app id, , use argument in dynamic query, runs in around 200ms.
however, if hardcode app_id
, runs in < 20ms.
here example of how invoke procedure
select id find_matching_visitors('my_app_id', '{}', '{( field = ''app_name'' , string_value ilike ''my awesome app'' )}')
any ideas ideas why?
create or replace function find_matching_visitors(app_id text, default_filters text[], custom_filters text[]) returns table ( id varchar ) $body$ declare default_filterstring text; custom_filterstring text; default_filter_length integer; custom_filter_length integer; sql varchar; begin default_filter_length := coalesce(array_length(default_filters, 1), 0); custom_filter_length := coalesce(array_length(custom_filters, 1), 0); default_filterstring := array_to_string(default_filters, ' , '); custom_filterstring := array_to_string(custom_filters, ' or '); if custom_filterstring = '' or custom_filterstring null custom_filterstring := '1=1'; end if; if default_filterstring = '' or default_filterstring null default_filterstring := '1=1'; end if; sql := format(' select v.id visitors v left join trackings t on v.id = t.visitor_id v.app_id = ''hardcoded_app_id'' , (%s) , (%s) group v.id ', custom_filterstring, default_filterstring, custom_filter_length, custom_filter_length); return query execute sql; end; $body$ language 'plpgsql';
analyze without hardcoding app_id
limit (cost=481.86..481.99 rows=50 width=531) (actual time=163.579..163.581 rows=9 loops=1) 2 -> sort (cost=481.86..484.26 rows=960 width=531) (actual time=163.578..163.579 rows=9 loops=1) 3 sort key: v0.last_seen desc 4 sort method: quicksort memory: 30kb 5 -> windowagg (cost=414.62..449.97 rows=960 width=531) (actual time=163.553..163.560 rows=9 loops=1) 6 -> hash join (cost=414.62..437.97 rows=960 width=523) (actual time=163.525..163.537 rows=9 loops=1) 7 hash cond: ((find_matching_visitors.id)::text = (v0.id)::text) 8 -> function scan on find_matching_visitors (cost=0.25..10.25 rows=1000 width=32) (actual time=153.918..153.918 rows=9 loops=1) 9 -> hash (cost=354.19..354.19 rows=4814 width=523) (actual time=9.578..9.578 rows=4887 loops=1) 10 buckets: 8192 batches: 1 memory usage: 2145kb 11 -> seq scan on visitors v0 (cost=0.00..354.19 rows=4814 width=523) (actual time=0.032..4.993 rows=4887 loops=1) 12 filter: ((not merged) , (((type)::text = 'user'::text) or ((type)::text = 'lead'::text))) 13 rows removed filter: 138 14 planning time: 1.134 ms 15 execution time: 163.705 ms
analyze when hardcoding app_id
limit (cost=481.86..481.99 rows=50 width=531) (actual time=25.890..25.893 rows=9 loops=1) 2 -> sort (cost=481.86..484.26 rows=960 width=531) (actual time=25.888..25.890 rows=9 loops=1) 3 sort key: v0.last_seen desc 4 sort method: quicksort memory: 30kb 5 -> windowagg (cost=414.62..449.97 rows=960 width=531) (actual time=25.862..25.870 rows=9 loops=1) 6 -> hash join (cost=414.62..437.97 rows=960 width=523) (actual time=25.830..25.841 rows=9 loops=1) 7 hash cond: ((find_matching_visitors.id)::text = (v0.id)::text) 8 -> function scan on find_matching_visitors (cost=0.25..10.25 rows=1000 width=32) (actual time=15.875..15.876 rows=9 loops=1) 9 -> hash (cost=354.19..354.19 rows=4814 width=523) (actual time=9.936..9.936 rows=4887 loops=1) 10 buckets: 8192 batches: 1 memory usage: 2145kb 11 -> seq scan on visitors v0 (cost=0.00..354.19 rows=4814 width=523) (actual time=0.013..5.232 rows=4887 loops=1) 12 filter: ((not merged) , (((type)::text = 'user'::text) or ((type)::text = 'lead'::text))) 13 rows removed filter: 138 14 planning time: 0.772 ms 15 execution time: 26.006 ms
update 1: added explain both cases. note: they're exact same plans, time spent varies
update 2: turned out needed pass app_id
argument format function, instead of embedding directly. took query time down around 20/30ms
the hard coded values important determine optimal query plan. example:
select * some_table id_person=231 select * some_table id_person=10
when 90% of some_table has id_person=231 pg uses full table scan, because that's fastest. when 1% of records has id_person=10 , uses index scan. plan used depends on value of parameter.
when using not hard coded value, e.g.
select * some_table id_person=?
it isn't able determine optimal query plan , query can slower.
Comments
Post a Comment