Tag Archives: Scala

Writing files to Hadoop HDFS using Scala

If you’ve been wondering whether storing files in Hadoop HDFS programmatically is difficult, I have good news – it’s not.

For the purpose of this example i’ll be using my favorite (recently) language – Scala.

 

Here’s what you need to do:

  1. Start a new SBT project in IntelliJ
  2. Add the “hadoop-client” dependency (Important: You must use the same version of the client, as is the version of the Hadoop server you’ll be writing files to)
    libraryDependencies ++= Seq(
      "org.apache.hadoop" % "hadoop-client" % "2.7.0"
    )
    
  3. Check in Hadoop configuration the value of “fs.default.name” property (/etc/hadoop/core-site.xml). This will be the URI you need in order to point the app code at your Hadoop Cluster
  4. Write few lines of code
    import org.apache.hadoop.conf.Configuration
    import org.apache.hadoop.fs.{FileSystem, Path}
    
    object Hdfs extends App {
    
      def write(uri: String, filePath: String, data: Array[Byte]) = {
        System.setProperty("HADOOP_USER_NAME", "Mariusz")
        val path = new Path(filePath)
        val conf = new Configuration()
        conf.set("fs.defaultFS", uri)
        val fs = FileSystem.get(conf)
        val os = fs.create(path)
        os.write(data)
        fs.close()
      }
    }
    
  5. Use the code written above
      Hdfs.write("hdfs://0.0.0.0:19000", "test.txt", "Hello World".getBytes)
    

 

That’s all there is to it, really

Cheers 🙂

Advertisements