Tag Archives: Scala

Writing files to Hadoop HDFS using Scala

If you’ve been wondering whether storing files in Hadoop HDFS programmatically is difficult, I have good news – it’s not.

For the purpose of this example i’ll be using my favorite (recently) language – Scala.


Here’s what you need to do:

  1. Start a new SBT project in IntelliJ
  2. Add the “hadoop-client” dependency (Important: You must use the same version of the client, as is the version of the Hadoop server you’ll be writing files to)
    libraryDependencies ++= Seq(
      "org.apache.hadoop" % "hadoop-client" % "2.7.0"
  3. Check in Hadoop configuration the value of “fs.default.name” property (/etc/hadoop/core-site.xml). This will be the URI you need in order to point the app code at your Hadoop Cluster
  4. Write few lines of code
    import org.apache.hadoop.conf.Configuration
    import org.apache.hadoop.fs.{FileSystem, Path}
    object Hdfs extends App {
      def write(uri: String, filePath: String, data: Array[Byte]) = {
        System.setProperty("HADOOP_USER_NAME", "Mariusz")
        val path = new Path(filePath)
        val conf = new Configuration()
        conf.set("fs.defaultFS", uri)
        val fs = FileSystem.get(conf)
        val os = fs.create(path)
  5. Use the code written above
      Hdfs.write("hdfs://", "test.txt", "Hello World".getBytes)


That’s all there is to it, really

Cheers 🙂