Java API操作Hdfs的示例詳解
1.遍歷當前目錄下所有文件與文件夾
可以使用listStatus方法實現上述需求。
listStatus方法簽名如下
/** * List the statuses of the files/directories in the given path if the path is * a directory. * * @param f given path * @return the statuses of the files/directories in the given patch * @throws FileNotFoundException when the path does not exist; * IOException see specific implementation */ public abstract FileStatus[] listStatus(Path f) throws FileNotFoundException, IOException;
可以看出listStatus隻需要傳入參數Path即可,返回的是一個FileStatus的數組。
而FileStatus包含有以下信息
/** Interface that represents the client side information for a file. */ @InterfaceAudience.Public @InterfaceStability.Stable public class FileStatus implements Writable, Comparable { private Path path; private long length; private boolean isdir; private short block_replication; private long blocksize; private long modification_time; private long access_time; private FsPermission permission; private String owner; private String group; private Path symlink; ....
從FileStatus中不難看出,包含有文件路徑,大小,是否是目錄,block_replication, blocksize…等等各種信息。
import org.apache.hadoop.fs.{FileStatus, FileSystem, Path} import org.apache.spark.sql.SparkSession import org.apache.spark.{SparkConf, SparkContext} import org.slf4j.LoggerFactory object HdfsOperation { val logger = LoggerFactory.getLogger(this.getClass) def tree(sc: SparkContext, path: String) : Unit = { val fs = FileSystem.get(sc.hadoopConfiguration) val fsPath = new Path(path) val status = fs.listStatus(fsPath) for(filestatus:FileStatus <- status) { logger.error("getPermission is: {}", filestatus.getPermission) logger.error("getOwner is: {}", filestatus.getOwner) logger.error("getGroup is: {}", filestatus.getGroup) logger.error("getLen is: {}", filestatus.getLen) logger.error("getModificationTime is: {}", filestatus.getModificationTime) logger.error("getReplication is: {}", filestatus.getReplication) logger.error("getBlockSize is: {}", filestatus.getBlockSize) if (filestatus.isDirectory) { val dirpath = filestatus.getPath.toString logger.error("文件夾名字為: {}", dirpath) tree(sc, dirpath) } else { val fullname = filestatus.getPath.toString val filename = filestatus.getPath.getName logger.error("全部文件名為: {}", fullname) logger.error("文件名為: {}", filename) } } } }
如果判斷fileStatus是文件夾,則遞歸調用tree方法,達到全部遍歷的目的。
2.遍歷所有文件
上面的方法是遍歷所有文件以及文件夾。如果隻想遍歷文件,可以使用listFiles方法。
def findFiles(sc: SparkContext, path: String) = { val fs = FileSystem.get(sc.hadoopConfiguration) val fsPath = new Path(path) val files = fs.listFiles(fsPath, true) while(files.hasNext) { val filestatus = files.next() val fullname = filestatus.getPath.toString val filename = filestatus.getPath.getName logger.error("全部文件名為: {}", fullname) logger.error("文件名為: {}", filename) logger.error("文件大小為: {}", filestatus.getLen) } }
/** * List the statuses and block locations of the files in the given path. * * If the path is a directory, * if recursive is false, returns files in the directory; * if recursive is true, return files in the subtree rooted at the path. * If the path is a file, return the file's status and block locations. * * @param f is the path * @param recursive if the subdirectories need to be traversed recursively * * @return an iterator that traverses statuses of the files * * @throws FileNotFoundException when the path does not exist; * IOException see specific implementation */ public RemoteIterator<LocatedFileStatus> listFiles( final Path f, final boolean recursive) throws FileNotFoundException, IOException { ...
從源碼可以看出,listFiles 返回一個可迭代的對象RemoteIterator<LocatedFileStatus>
,而listStatus返回的是個數組。同時,listFiles返回的都是文件。
3.創建文件夾
def mkdirToHdfs(sc: SparkContext, path: String) = { val fs = FileSystem.get(sc.hadoopConfiguration) val result = fs.mkdirs(new Path(path)) if (result) { logger.error("mkdirs already success!") } else { logger.error("mkdirs had failed!") } }
4.刪除文件夾
def deleteOnHdfs(sc: SparkContext, path: String) = { val fs = FileSystem.get(sc.hadoopConfiguration) val result = fs.delete(new Path(path), true) if (result) { logger.error("delete already success!") } else { logger.error("delete had failed!") } }
5.上傳文件
def uploadToHdfs(sc: SparkContext, localPath: String, hdfsPath: String): Unit = { val fs = FileSystem.get(sc.hadoopConfiguration) fs.copyFromLocalFile(new Path(localPath), new Path(hdfsPath)) fs.close() }
6.下載文件
def downloadFromHdfs(sc: SparkContext, localPath: String, hdfsPath: String) = { val fs = FileSystem.get(sc.hadoopConfiguration) fs.copyToLocalFile(new Path(hdfsPath), new Path(localPath)) fs.close() }
到此這篇關於Java API操作Hdfs詳細示例的文章就介紹到這瞭,更多相關Java API操作Hdfs內容請搜索WalkonNet以前的文章或繼續瀏覽下面的相關文章希望大傢以後多多支持WalkonNet!
推薦閱讀:
- java實現對Hadoop的操作
- Hadoop集成Spring的使用詳細教程(快速入門大數據)
- java如何讀取某個文件夾中的全部文件(包括子文件夾)
- ASP.NET Core實現文件上傳和下載
- java中壓縮文件並下載的實例詳解