profile file in the editor of your choice, such as nano or vim.įor example, to use nano, enter: nano. You can also add the export paths by editing the. profile: echo "export SPARK_HOME=/opt/spark" > ~/.profileĮcho "export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin" > ~/.profileĮcho "export PYSPARK_PYTHON=/usr/bin/python3" > ~/.profile
#Install apache spark cluster and hadoop on cluster install#
Now i was thinking to install Apache Spark on my Hadoop Yarn cluster also with HA Capability. All works good for now and i have tested the Failover scenario using zookeeper on NN1 and NN2 and works well. Use the echo command to add these three lines to. Am new to the Big data environment and just started with installing a 3 Node Hadoop cluster 2.6 with HA Capability using Zookeeper. There are a few Spark home paths you need to add to the user profile. Configure Spark Environmentīefore starting a master server, you need to configure environment variables. If you mistype the name, you will get a message similar to: mv: cannot stat 'spark-3.0.1-bin-hadoop2.7': No such file or directory. The terminal returns no response if it successfully moves the directory. Use the mv command to do so: sudo mv spark-3.0.1-bin-hadoop2.7 /opt/spark The output shows the files that are being unpacked from the archive.įinally, move the unpacked directory spark-3.0.1-bin-hadoop2.7 to the opt/spark directory. Now, extract the saved archive using tar: tar xvf spark-* Remember to replace the Spark version number in the subsequent commands if you change the download URL. Note: If the URL does not work, please go to the Apache Spark download page to check for the latest version.