If you’ve been wondering whether storing files in Hadoop HDFS programmatically is difficult, I have good news – it’s not.
For the purpose of this example i’ll be using my favorite (recently) language – Scala.
Here’s what you need to do:
- Start a new SBT project in IntelliJ
- Add the “hadoop-client” dependency (Important: You must use the same version of the client, as is the version of the Hadoop server you’ll be writing files to)
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-client" % "2.7.0"
)
- Check in Hadoop configuration the value of “fs.default.name” property (/etc/hadoop/core-site.xml). This will be the URI you need in order to point the app code at your Hadoop Cluster
- Write few lines of code
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
object Hdfs extends App {
def write(uri: String, filePath: String, data: Array[Byte]) = {
System.setProperty("HADOOP_USER_NAME", "Mariusz")
val path = new Path(filePath)
val conf = new Configuration()
conf.set("fs.defaultFS", uri)
val fs = FileSystem.get(conf)
val os = fs.create(path)
os.write(data)
fs.close()
}
}
- Use the code written above
Hdfs.write("hdfs://0.0.0.0:19000", "test.txt", "Hello World".getBytes)
That’s all there is to it, really
Cheers 🙂