Lendo HDFS e arquivos locais em Java

Eu quero ler os caminhos de arquivo, independentemente de serem HDFS ou locais. Atualmente, passo os caminhos locais com o prefixo file: // e os caminhos HDFS com o prefixo hdfs: // e escrevo um código como o seguinte

Configuration configuration = new Configuration(); FileSystem fileSystem = null; if (filePath.startsWith("hdfs://")) { fileSystem = FileSystem.get(configuration); } else if (filePath.startsWith("file://")) { fileSystem = FileSystem.getLocal(configuration).getRawFileSystem(); } 

A partir daqui eu uso as API’s do FileSystem para ler o arquivo.

Você pode por favor me avisar se há alguma outra maneira melhor do que isso?

Isso faz sentido,

 public static void main(String[] args) throws IOException { Configuration conf = new Configuration(); conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml")); conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml")); BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); System.out.println("Enter the file path..."); String filePath = br.readLine(); Path path = new Path(filePath); FileSystem fs = path.getFileSystem(conf); FSDataInputStream inputStream = fs.open(path); System.out.println(inputStream.available()); fs.close(); } 

Você não tem que colocar esse cheque se você seguir esse caminho. Obtenha o FileSystem diretamente do Path e faça o que quiser.

Você pode obter o FileSystem da seguinte maneira:

 Configuration conf = new Configuration(); Path path = new Path(stringPath); FileSystem fs = FileSystem.get(path.toUri(), conf); 

Você não precisa julgar se o caminho começa com hdfs:// ou file:// . Essa API fará o trabalho.

Por favor, verifique o snippet de código abaixo dos arquivos da lista no caminho do HDFS; ou seja, a string de caminho que começa com hdfs:// . Se você puder fornecer a configuração do Hadoop e o caminho local, ele também listará os arquivos do sistema de arquivos local; ou seja, a string de caminho que começa com file:// .

  //helper method to get the list of files from the HDFS path public static List listFilesFromHDFSPath(Configuration hadoopConfiguration, String hdfsPath, boolean recursive) { //resulting list of files List filePaths = new ArrayList(); FileSystem fs = null; //try-catch-finally all possible exceptions try { //get path from string and then the filesystem Path path = new Path(hdfsPath); //throws IllegalArgumentException, all others will only throw IOException fs = path.getFileSystem(hadoopConfiguration); //resolve hdfsPath first to check whether the path exists => either a real directory or o real file //resolvePath() returns fully-qualified variant of the path path = fs.resolvePath(path); //if recursive approach is requested if (recursive) { //(heap issues with recursive approach) => using a queue Queue fileQueue = new LinkedList(); //add the obtained path to the queue fileQueue.add(path); //while the fileQueue is not empty while (!fileQueue.isEmpty()) { //get the file path from queue Path filePath = fileQueue.remove(); //filePath refers to a file if (fs.isFile(filePath)) { filePaths.add(filePath.toString()); } else //else filePath refers to a directory { //list paths in the directory and add to the queue FileStatus[] fileStatuses = fs.listStatus(filePath); for (FileStatus fileStatus : fileStatuses) { fileQueue.add(fileStatus.getPath()); } // for } // else } // while } // if else //non-recursive approach => no heap overhead { //if the given hdfsPath is actually directory if (fs.isDirectory(path)) { FileStatus[] fileStatuses = fs.listStatus(path); //loop all file statuses for (FileStatus fileStatus : fileStatuses) { //if the given status is a file, then update the resulting list if (fileStatus.isFile()) filePaths.add(fileStatus.getPath().toString()); } // for } // if else //it is a file then { //return the one and only file path to the resulting list filePaths.add(path.toString()); } // else } // else } // try catch(Exception ex) //will catch all exception including IOException and IllegalArgumentException { ex.printStackTrace(); //if some problem occurs return an empty array list return new ArrayList(); } // finally { //close filesystem; not more operations try { if(fs != null) fs.close(); } catch (IOException e) { e.printStackTrace(); } // catch } // finally //return the resulting list; list can be empty if given path is an empty directory without files and sub-directories return filePaths; } // listFilesFromHDFSPath 

Se você realmente deseja trabalhar com a API java.io.File, o método a seguir ajudará você a listar arquivos apenas do sistema de arquivos local; ou seja, string de caminho que começa com file:// .

  //helper method to list files from the local path in the local file system public static List listFilesFromLocalPath(String localPathString, boolean recursive) { //resulting list of files List localFilePaths = new ArrayList(); //get the Java file instance from local path string File localPath = new File(localPathString); //this case is possible if the given localPathString does not exit => which means neither file nor a directory if(!localPath.exists()) { System.err.println("\n" + localPathString + " is neither a file nor a directory; please provide correct local path"); //return with empty list return new ArrayList(); } // if //at this point localPath does exist in the file system => either as a directory or a file //if recursive approach is requested if (recursive) { //recursive approach => using a queue Queue fileQueue = new LinkedList(); //add the file in obtained path to the queue fileQueue.add(localPath); //while the fileQueue is not empty while (!fileQueue.isEmpty()) { //get the file from queue File file = fileQueue.remove(); //file instance refers to a file if (file.isFile()) { //update the list with file absolute path localFilePaths.add(file.getAbsolutePath()); } // if else //else file instance refers to a directory { //list files in the directory and add to the queue File[] listedFiles = file.listFiles(); for (File listedFile : listedFiles) { fileQueue.add(listedFile); } // for } // else } // while } // if else //non-recursive approach { //if the given localPathString is actually a directory if (localPath.isDirectory()) { File[] listedFiles = localPath.listFiles(); //loop all listed files for (File listedFile : listedFiles) { //if the given listedFile is actually a file, then update the resulting list if (listedFile.isFile()) localFilePaths.add(listedFile.getAbsolutePath()); } // for } // if else //it is a file then { //return the one and only file absolute path to the resulting list localFilePaths.add(localPath.getAbsolutePath()); } // else } // else //return the resulting list; list can be empty if given path is an empty directory without files and sub-directories return localFilePaths; } // listFilesFromLocalPath