Forcing driver to run on specific slave in spark standalone cluster running with "--deploy-mode cluster" -
i running small spark cluster, 2 ec2 instances (m4.xlarge).
so far have been running spark master on 1 node, , single spark slave (4 cores, 16g memory) on other, deploying spark (streaming) app in client deploy-mode on master. summary of settings is:
--executor-memory 16g
--executor-cores 4
--driver-memory 8g
--driver-cores 2
--deploy-mode client
this results in single executor on single slave running 4 cores , 16gb memory. driver runs "outside" of cluster on master-node (i.e. not allocated resources master).
ideally i'd use cluster deploy-mode can take advantage of supervise option. have started second slave on master node giving 2 cores , 8g memory (smaller allocated resources leave space master daemon).
when run spark job in cluster deploy-mode (using same settings above --deploy-mode cluster). around 50% of time desired deployment driver runs through slave running on master node (which has right resources of 2 cores & 8gb) leaves original slave node free allocate executor of 4 cores & 16gb. other 50% of time master runs driver on non-master slave node, means driver on node 2 cores & 8gb memory, leaves no node sufficient resources start executor (which requires 4 cores & 16gb).
is there way force spark master use specific worker / slave driver? given spark knows there 2 slave nodes, 1 2 cores , other 4 cores, , driver needs 2 cores, , executor needs 4 cores ideally work out right optimal placement, doesn't seem case.
any ideas / suggestions gratefully received!
thanks!
i can see old question, let me answer still, might find useful.
add --driver-java-options="-dspark.driver.host=<host>"
option spark-submit
script, when submitting application, , spark should deploy driver specified host.
Comments
Post a Comment