Tutorial Install Hadoop on Debian 10 [Complete]

In this article, we want to teach you How to Install Hadoop on Debian 10, step by step. Hadoop is an open-source software framework that enables distributed processing of large data on clusters of servers. This framework is written in Java and is designed to perform distributed processing on thousands of machines with high fault tolerance. Instead of relying on expensive hardware, fault tolerance in these clusters is applied by the software’s ability to detect and manage layer failure. Leading users of Hadoop are Facebook and Yahoo.

If you intend to buy a Linux VPS server, we suggest you use the plans provided on our website, which are with immediate delivery.

Prerequisites to Installing Hadoop on Debian

You must first add the repository:

sudo add-apt-repository ppa:openjdk-r/ppa

Update system packages with the following command:

Sudo apt update

Install open-jdk-8 by entering the following command:

sudo apt install openjdk-8-jdk

In this step, you should install SSH by using the following command:

sudo apt install ssh

You can now install rsync:

sudo apt install rsync

Setup SSH without a passphrase using the following command:

ssh-keygen -t rsa

In the next step, you must enter the following command:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Note 1: To connect to the host localhost port 22 SSH, you must first restart SSH by entering the following command:

service ssh restart

Note 2: This can be a permission issue, so try using the chmod command:

chmod -R 700 ~/.ssh
chmod -R 700 ~/.ssh
chmod 644 ~/.ssh/authorized_keys
chmod 644 ~/.ssh/known_hosts
chmod 644 ~/.ssh/config
chmod 600 ~/.ssh/id_rsa
chmod 644 ~/.ssh/id_rsa.pub

Now run again by entering the following command:

ssh localhost

How to Install Hadoop on Debian 10

At first, You should now download Apache Hadoop.

In this step, after downloading the file, you must extract the file with the help of the following command:

tar -xzf Hadoop-3.2.1.tar.gz

Now you need to copy the Hadoop folder to your desired location and rename it.

In this step, you need to edit the .bashrc folder [location: ~ (home directory)] and insert the code given in the image below into .bashrc. Note that change the username in HADOOP_HOME according to your username.

#for hadoop

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 #JAVA_JDK directory

export HADOOP_HOME=/home/username/hadoop #location of your hadoop file directory

export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_USER_CLASSPATH_FIRST=true

alias hadoop=$HADOOP_HOME/bin/./hadoop #for convenience
alias hdfs=$HADOOP_HOME/bin/./hdfs #for convenience

#done

Enter the following command to get the JAVA_JDK path command:

readlink -f \$ which java

Now reload the .bashrc file by entering the following command to apply the changes:

source .bashrc

At this point, start editing the files in Hadoop/etc/Hadoop by adding the core-site.xml code below:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

You should also add the hdfs-site.xml code below. Do not forget to change your username.

<configuration>
<property>
<name>dfs.name.dir</name>
<value>file:///home/username/pseudo/dfs/name</value> <!-- username = use `whoami` command in terminal to know your username in machine -->
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/username/pseudo/dfs/data</value> <!-- username = use `whoami` command in terminal to know your username in machine -->
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Add the mapred-site.xml code below:

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>

Finally, add the code hadoop-env.sh below:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 #JAVA_JDK directory

Now enter the following code to run the JAVA_JDK path:

readlink -f \$ which java

In this step, you must format the Hadoop file by running the following command:

hadoop namenode -format

How to Run Hadoop on Debian 10

To run Hadoop, just enter the following command:

HADOOP_HOME/sbin/start-all.sh

Now if you go to HTTP://localhost: 50070 from your browser, you will get your Hadoop working.

http://localhost: 50070 Moved to http://localhost: 9870 because Hadoop 3.0.0 – Alpha 1 changed the port configuration.

The following command is used to check the process and port:

jps

Use the following command to stop Hadoop:

HADOOP_HOME/sbin/stop-all.sh

After the PC starts, you can enable Hadoop by entering the following command:

HADOOP_HOME/sbin/start-all.sh`

The default port number to access all applications of cluster 8088 is as follows:

http://localhost:8088/

Conclusion

Hadoop is an open-source software used to split and distribute centralized files. In this article, Hadoop and its features are fully introduced. If you are planning to install Hadoop but do not know enough about how to install Hadoop in Debian 10, you can easily do so with our step-by-step tutorials.

Morris Copeland

18 October 2023 12:51

What are the proper programming languages to use with Hadoop?

Jannson Miller

21 October 2023 07:15

What are the proper programming languages to use with Hadoop? ...

1. Java
2. Python
3. Scala
4. R
5. Pig Latin
6. HiveQL
7. Spark

Willi

18 October 2023 12:32

Can I run Hadoop on a single machine for testing and development purposes?

21 October 2023 07:13

Can I run Hadoop on a single machine for testing and development ...

Yes, you can run Hadoop on a single machine for testing and development purposes. Hadoop supports running in a standalone mode, where all the Hadoop daemons run on a single machine. This allows you to test and develop Hadoop applications without the need for a distributed cluster.

In standalone mode, Hadoop runs the Hadoop Distributed File System (HDFS) and MapReduce on the same machine. Although you won't get the benefits of distributed processing and fault tolerance, it still provides a convenient way to explore and experiment with Hadoop functionalities on a small scale.

Dixie Washington

18 October 2023 11:48

I recently set up Hadoop on a Debian server for a big data project, and I'm quite pleased with the results. It's been a valuable addition to our data analysis toolkit. Overall, Hadoop on Debian has been a reliable and robust solution for our big data needs.

21 October 2023 07:11

I recently set up Hadoop on a Debian server for a big data projec ...

Thanks for your useful comment Dear Dixie.

Deanna

18 October 2023 09:55

What can be some common issues I might encounter during Hadoop installation?

21 October 2023 07:04

What can be some common issues I might encounter during Hadoop in ...

1. Java version compatibility
2. Permission issues
3. Configuration errors
4. Network/firewall settings
5. Disk space
6. Dependencies