Installing Hadoop on Windows 8 or 8.1

I was installing Hadoop 2.7.0 recently on a Windows platform (8.1) and thought i’ll document the steps, as the procedure isn’t that obvious (existing documentation on how to do it, is outdated in few places)

 

Basic info:

  • Official Apache Hadoop releases do not include Windows binaries, so you have to download sources and build a Windows package yourself.
  • Do not run the installation from within Cygwin. Cygwin is not required/supported anymore
  • I assume you have a JDK already installed (ver. 1.7+)
  • I assume you have Unix command-line tools (like: sh, mkdir, rm, cp, tar, gzip) installed as well. These tools must be present on your PATH. They come with Windows Git package that can be downloaded from here or you can also use win-bash (here) or GnuWin32.
  • If using Visual Studio, it must be Visual Studio 2010 Professional (not 2012).
  • Do not use Visual Studio Express (It does not support compiling for 64-bit)
  • Google’s Protocol Buffers must be installed in exactly version 2.5.0 (not newer, this is a hard-coded dependency …weird)
  • Several tests that are being executed while building hadoop widows package, require that the user must have the “Create Symbolic Links” privilege. Therefore, the ‘mvn package’ command must be executed from the Command Line in “Administrator mode”.

 

Installation:

  1. Download Hadoop sources tarball from here.
  2. Make sure you have JAVA_HOME in your “Environment Variables” set up properly (in my case it was “c:\Program Files\Java\jdk1.8.0_40”)
  3. Download Maven binaries from here.
  4. Add ‘bin’ folder of maven to your path (in “Environment Variables”)
  5. Download Google’s Protocol Buffers in version 2.5.0 (no other version, including 2.6.1 will work) from here.
  6. Download and install CMake (Windows Installer) from here.
  7. Download and install “Visual Studio 2010 Professional” (Trial is enough) from here (Web Installer) or here (ISO Image)
  8. Alternatively (to the step no 7 above), you can install “Windows SDK 8.1” from here.
  9. Add the location of newly installed MSBuild.exe (c:\Windows\Microsoft.NET\Framework64\v4.0.30319;) to your system path (in “Environment Variables”).
  10. Because you’ll be running the Maven ‘package’ goal from the Command Line (cmd.exe) in “Administrator mode” (aka. “Elevated mode”), it is important that in steps no 4 and 9 above, you’re updating the “PATH” in “System variables” section, and not in “User variables for logged-in user” section.
  11. Run cmd in “Administrator Mode” and execute: “set Platform=x64” (assuming you want 64-bit version, otherwise use “set Platform=Win32”)
  12. Now, while still in cmd, execute:
    mvn package -Pdist,native-win -DskipTests -Dtar
    
  13. After the build is complete, you should find hadoop-2.7.0.tar.gz file in “hadoop-2.7.0-src\hadoop-dist\target\” directory.
  14. Extract the newly created Hadoop Windows package to the directory of choice (eg. c:\hdp\)

 

Testing:

  1. We’ll be configuring Hadoop for a Single Node (pseudo-distributed) Cluster.
  2. As part of configuring HDFS, update the files:
    1. near the end of “\hdp\etc\hadoop\hadoop-env.cmd” add following lines:
        set HADOOP_PREFIX=c:\hdp
        set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop
        set YARN_CONF_DIR=%HADOOP_CONF_DIR%
        set PATH=%PATH%;%HADOOP_PREFIX%\bin
      
    2. modify “\hdp\etc\hadoop\core-site.xml” with following:
      <configuration>
        <property>
          <name>fs.default.name</name>
          <value>hdfs://0.0.0.0:19000</value>
        </property>
      </configuration>
      
    3. modify “\hdp\etc\hadoop\hdfs-site.xml” with:
      <configuration>
        <property>
          <name>dfs.replication</name>
          <value>1</value>
        </property>
      </configuration>
      
    4. Finally, make sure “\hdp\etc\hadoop\slaves” has the following entry:

        localhost
      
    5. and create c:\tmp directory as the default configuration puts HDFS metadata and data files under \tmp on the current drive
  3. As part of configuring YARN, update files:
    1. add following entries to “\hdp\etc\hadoop\mapred-site.xml”, replacing %USERNAME% with your Windows user name:
      <configuration>
        <property>
          <name>mapreduce.job.user.name</name>
          <value>%USERNAME%</value>
        </property>
        <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
        </property>
        <property>
          <name>yarn.apps.stagingDir</name>
          <value>/user/%USERNAME%/staging</value>
        </property>
        <property>
          <name>mapreduce.jobtracker.address</name>
          <value>local</value>
        </property>
      </configuration>
      
    2. modify “\hdp\etc\hadoop\yarn-site.xml”, with:
      <configuration>
        <property>
          <name>yarn.server.resourcemanager.address</name>
          <value>0.0.0.0:8020</value>
        </property>
        <property>
          <name>yarn.server.resourcemanager.application.expiry.interval</name>
          <value>60000</value>
        </property>
        <property>
          <name>yarn.server.nodemanager.address</name>
          <value>0.0.0.0:45454</value>
        </property>
        <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
        </property>
        <property>
          <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
          <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
          <name>yarn.server.nodemanager.remote-app-log-dir</name>
          <value>/app-logs</value>
        </property>
        <property>
          <name>yarn.nodemanager.log-dirs</name>
          <value>/dep/logs/userlogs</value>
        </property>
        <property>
          <name>yarn.server.mapreduce-appmanager.attempt-listener.bindAddress</name>
          <value>0.0.0.0</value>
        </property>
        <property>
          <name>yarn.server.mapreduce-appmanager.client-service.bindAddress</name>
          <value>0.0.0.0</value>
        </property>
        <property>
          <name>yarn.log-aggregation-enable</name>
          <value>true</value>
        </property>
        <property>
          <name>yarn.log-aggregation.retain-seconds</name>
          <value>-1</value>
        </property>
        <property>
          <name>yarn.application.classpath</name>
          <value>%HADOOP_CONF_DIR%,%HADOOP_COMMON_HOME%/share/hadoop/common/*,%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*</value>
        </property>
      </configuration>
      
  4. because Hadoop doesn’t recognize JAVA_HOME from “Environment Variables” (and has problems with spaces in pathnames)
    1. copy your JDK to some dir (eg. “c:\hdp\java\jdk1.8.0_40”)
    2. edit “\hdp\etc\hadoop\hadoop-env.cmd” and update
        set JAVA_HOME=c:\hdp\java\jdk1.8.0_40
      
    3. initialize Environment Variables by running cmd in “Administrator Mode” and executing: “c:\hdp\etc\hadoop\hadoop-env.cmd”
  5. Format the FileSystem
      c:\hdp\bin\hdfs namenode -format
    
  6. Start HDFS Daemons
      c:\hdp\sbin\start-dfs.cmd
    
  7. Start YARN Daemons
      c:\hdp\sbin\start-yarn.cmd
    
  8. Run an example YARN job
      c:\hdp\bin\yarn jar c:\hdp\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.0.jar wordcount c:\hdp\LICENSE.txt /out
    
  9. Check the following pages in your browser:
      Resource Manager:  http://localhost:8088
      Web UI of the NameNode daemon:  http://localhost:50070
      HDFS NameNode web interface:  http://localhost:8042
    

 

Voilà.

 

 

Resources:

Tagged:

6 thoughts on “Installing Hadoop on Windows 8 or 8.1

  1. DJ December 31, 2015 at 5:52 pm Reply

    Thanks for this, I’m trying it out…

  2. Lurking May 15, 2016 at 9:11 am Reply

    You dont even know how much you helped me, its so hard to find these kind of tutorial, thanks a ton! keep up the good work ❤

  3. Sudesh May 16, 2016 at 4:57 pm Reply

    Perfect step by step tutorial to install hadoop on windows.. I have installed in on windows 10. Thanks a lot…

  4. Sudhindra Pranesh July 18, 2016 at 2:58 am Reply

    Man, Finally I got Hadoop working on Windows 10 and I got to thank you! Appreciated. Keep the good work going dude.

  5. […] After some searches I found excellent instructions on installing Hadoop on Windows 8.1   on Mariusz Przydatek’s blog. […]

  6. Rajapriya R (@Rajapriya27) April 26, 2017 at 8:19 am Reply

    Great and helpful blog to everyone.. Installation procedure are very clear and step by so easy to understand.. All installation commands are very clear and i learnt installation procedure easily form this blog so i install hadoop in my system very quickly.. thanks a lot for sharing this blog to us…

    hadoop training and placements | big data training and placements

Leave a comment