If you’ve been wondering whether storing files in Hadoop HDFS programmatically is difficult, I have good news – it’s not.
For the purpose of this example i’ll be using my favorite (recently) language – Scala.
Here’s what you need to do:
- Start a new SBT project in IntelliJ
- Add the “hadoop-client” dependency (Important: You must use the same version of the client, as is the version of the Hadoop server you’ll be writing files to)
libraryDependencies ++= Seq( "org.apache.hadoop" % "hadoop-client" % "2.7.0" )
- Check in Hadoop configuration the value of “fs.default.name” property (/etc/hadoop/core-site.xml). This will be the URI you need in order to point the app code at your Hadoop Cluster
- Write few lines of code
import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.{FileSystem, Path} object Hdfs extends App { def write(uri: String, filePath: String, data: Array[Byte]) = { System.setProperty("HADOOP_USER_NAME", "Mariusz") val path = new Path(filePath) val conf = new Configuration() conf.set("fs.defaultFS", uri) val fs = FileSystem.get(conf) val os = fs.create(path) os.write(data) fs.close() } }
- Use the code written above
Hdfs.write("hdfs://0.0.0.0:19000", "test.txt", "Hello World".getBytes)
That’s all there is to it, really
Cheers 🙂
So amazingly useful. Thanks for posting!