从简单的Java程序调用mapreduce作业

我一直试图从一个简单的Java程序在同一个包中调用一个mapreduce作业。我试图在我的java程序中引用mapreduce jar文件，并使用runJar(String args[])方法通过传递input并为mapreduce作业输出path。但程序dint工作..

我如何运行这样的程序，我只是使用传递input，输出和jarpath到它的主要方法？是否有可能通过它来运行mapreduce作业（jar）？我想这样做是因为我想要一个接一个地运行几个mapreduce作业，其中我的java程序vl通过引用它的jar文件来调用每个这样的作业。如果这可能，我不妨使用一个简单的servlet来做这样的调用并将其输出文件用于graphics目的..

 /* * To change this template, choose Tools | Templates * and open the template in the editor. */ /** * * @author root */ import org.apache.hadoop.util.RunJar; import java.util.*; public class callOther { public static void main(String args[])throws Throwable { ArrayList arg=new ArrayList(); String output="/root/Desktp/output"; arg.add("/root/NetBeansProjects/wordTool/dist/wordTool.jar"); arg.add("/root/Desktop/input"); arg.add(output); RunJar.main((String[])arg.toArray(new String[0])); } }

哦，请不要用runJar ，Java API是非常好的。

看看如何从正常的代码开始工作：

 // create a configuration Configuration conf = new Configuration(); // create a new job based on the configuration Job job = new Job(conf); // here you have to put your mapper class job.setMapperClass(Mapper.class); // here you have to put your reducer class job.setReducerClass(Reducer.class); // here you have to set the jar which is containing your // map/reduce class, so you can use the mapper class job.setJarByClass(Mapper.class); // key/value of your reducer output job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); // this is setting the format of your input, can be TextInputFormat job.setInputFormatClass(SequenceFileInputFormat.class); // same with output job.setOutputFormatClass(TextOutputFormat.class); // here you can set the path of your input SequenceFileInputFormat.addInputPath(job, new Path("files/toMap/")); // this deletes possible output paths to prevent job failures FileSystem fs = FileSystem.get(conf); Path out = new Path("files/out/processed/"); fs.delete(out, true); // finally set the empty out path TextOutputFormat.setOutputPath(job, out); // this waits until the job completes and prints debug out to STDOUT or whatever // has been configured in your log4j properties. job.waitForCompletion(true);

如果您正在使用外部群集，则必须通过以下方式将以下信息添加到您的configuration中：

 // this should be like defined in your mapred-site.xml conf.set("mapred.job.tracker", "jobtracker.com:50001"); // like defined in hdfs-site.xml conf.set("fs.default.name", "hdfs://namenode.com:9000");

当hadoop-core.jar位于应用程序容器类path中时，这应该不成问题。但是我认为你应该在网页上放一些进度指示器，因为完成一个hadoop工作可能需要几分钟到几个小时;）

对于YARN（> Hadoop 2）

对于YARN，需要设置以下configuration。

 // this should be like defined in your yarn-site.xml conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001"); // framework is now "yarn", should be defined like this in mapred-site.xm conf.set("mapreduce.framework.name", "yarn"); // like defined in hdfs-site.xml conf.set("fs.default.name", "hdfs://namenode.com:9000");

从Java Web应用程序调用MapReduce作业（Servlet）

您可以使用Java API从Web应用程序调用MapReduce作业。这里是从servlet调用MapReduce作业的一个小例子。步骤如下：

第1步 ：首先创build一个MapReduce驱动程序的servlet类。还开发地图和减less服务。这里有一个示例代码片段：

CallJobFromServlet.java

  public class CallJobFromServlet extends HttpServlet { protected void doPost(HttpServletRequest request,HttpServletResponse response) throws ServletException, IOException { Configuration conf = new Configuration(); // Replace CallJobFromServlet.class name with your servlet class Job job = new Job(conf, " CallJobFromServlet.class"); job.setJarByClass(CallJobFromServlet.class); job.setJobName("Job Name"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setMapperClass(Map.class); // Replace Map.class name with your Mapper class job.setNumReduceTasks(30); job.setReducerClass(Reducer.class); //Replace Reduce.class name with your Reducer class job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); // Job Input path FileInputFormat.addInputPath(job, new Path("hdfs://localhost:54310/user/hduser/input/")); // Job Output path FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:54310/user/hduser/output")); job.waitForCompletion(true); } }

第2步 ：将所有相关jar（hadoop，特定于应用程序的jar文件）文件放在web服务器的lib文件夹（例如Tomcat）中。这对于访问Hadoopconfiguration是必须的（hadoop'conf'文件夹有configurationxml文件，即core-site.xml，hdfs-site.xml等）。只需将hadoop lib文件夹中的jar文件复制到web服务器（tomcat）lib目录即可。 jar子名称列表如下：

 1. commons-beanutils-1.7.0.jar 2. commons-beanutils-core-1.8.0.jar 3. commons-cli-1.2.jar 4. commons-collections-3.2.1.jar 5. commons-configuration-1.6.jar 6. commons-httpclient-3.0.1.jar 7. commons-io-2.1.jar 8. commons-lang-2.4.jar 9. commons-logging-1.1.1.jar 10. hadoop-client-1.0.4.jar 11. hadoop-core-1.0.4.jar 12. jackson-core-asl-1.8.8.jar 13. jackson-mapper-asl-1.8.8.jar 14. jersey-core-1.8.jar

步骤3 ：将您的Web应用程序部署到Web服务器（在Tomcat的'webapps'文件夹中）。

第4步 ：创build一个jsp文件，并在表单action属性中链接servlet类（CallJobFromServlet.java）。这里有一个示例代码片段：

的index.jsp

 <form id="trigger_hadoop" name="trigger_hadoop" action="./CallJobFromServlet "> <span class="back">Trigger Hadoop Job from Web Page </span> <input type="submit" name="submit" value="Trigger Job" /> </form>

Hadoop示例中已经实现的作业的另一种方法，也需要导入hadoopjar..然后只需要调用所需作业类的静态主函数与适当的String []参数

由于map和reduce在不同的机器上运行，所有引用的类和jar都必须从机器移动到机器。

如果你有包jar，并在你的桌面上运行，@ ThomasJungblut的答案是好的。但是，如果你在Eclipse中运行，右键单击你的类并运行，它不起作用。

代替：

 job.setJarByClass(Mapper.class);

使用：

 job.setJar("build/libs/hdfs-javac-1.0.jar");

同时，你的jar的清单必须包含Main-Class属性，这是你的主类。

对于Gradle用户，可以将这些行放在build.gradle中：

 jar { manifest { attributes("Main-Class": mainClassName) }}

我想不出有多less方法可以做到这一点，而不涉及哈多普核心库（或者实际上@ThomasJungblut说，为什么你会想）。

但是，如果您绝对必须，您可以为您的工作设置一个带有工作stream程的Oozie服务器，然后使用Oozie Web服务界面将工作stream程提交给Hadoop。

再次，这似乎是很多工作的东西，可以使用托马斯的答案解决（包括hadoop核心jar和使用他的代码片段）

你可以这样做

 public class Test { public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new YourJob(), args); System.exit(res); }

从简单的Java程序调用mapreduce作业

集成testingHive作业

在Hadoop中链接多个MapReduce作业

Oozie：从Oozie <java>操作启动Map-Reduce？

HDFS错误：只能复制到0个节点，而不是1个

Parquet vs ORC与ORC与Snappy

可扩展的图像存储

OSX上的Hadoop“无法从SCDynamicStore加载领域信息”

使用Java在hdfs中编写一个文件

无法在hadoop二进制path中findwinutils二进制文件

HBase RESTfilter（SingleColumnValueFilter）