Writing files to Hadoop HDFS using Scala

If you’ve been wondering whether storing files in Hadoop HDFS programmatically is difficult, I have good news – it’s not.

For the purpose of this example i’ll be using my favorite (recently) language – Scala.


Here’s what you need to do:

  1. Start a new SBT project in IntelliJ
  2. Add the “hadoop-client” dependency (Important: You must use the same version of the client, as is the version of the Hadoop server you’ll be writing files to)
    libraryDependencies ++= Seq(
      "org.apache.hadoop" % "hadoop-client" % "2.7.0"
  3. Check in Hadoop configuration the value of “fs.default.name” property (/etc/hadoop/core-site.xml). This will be the URI you need in order to point the app code at your Hadoop Cluster
  4. Write few lines of code
    import org.apache.hadoop.conf.Configuration
    import org.apache.hadoop.fs.{FileSystem, Path}
    object Hdfs extends App {
      def write(uri: String, filePath: String, data: Array[Byte]) = {
        System.setProperty("HADOOP_USER_NAME", "Mariusz")
        val path = new Path(filePath)
        val conf = new Configuration()
        conf.set("fs.defaultFS", uri)
        val fs = FileSystem.get(conf)
        val os = fs.create(path)
  5. Use the code written above
      Hdfs.write("hdfs://", "test.txt", "Hello World".getBytes)


That’s all there is to it, really

Cheers 🙂


Tagged: ,

One thought on “Writing files to Hadoop HDFS using Scala

  1. tony April 21, 2017 at 4:43 pm Reply

    So amazingly useful. Thanks for posting!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: