“屠神星柱!”红衣一看阿尔忒弥斯想过去阻止刘皓眼中顿时闪过了一道凶光,玉手轻轻一拍,一道血色星柱直接轰在了阿尔忒弥斯的身上,逼迫得她不得不全力防守抵挡。

PredictionIO+Universal Recommender快速开发部署推荐引擎的问题总结(3)


PredictionIO+Universal Recommender虽然可以帮助中小企业快速的搭建部署基于用户行为协同过滤的个性化推荐引擎,单纯从引擎层面来看,开发成本近乎于零,但仍然需要一些前提条件。比如说,组织内部最好已经搭建了较稳定的Hadoop,Spark集群,至少要拥有一部分熟悉Spark平台的开发和运维人员,否则会需要技术团队花费很长时间来踩坑,试错。

本文列举了一些PredictionIO+Universal Recommender的使用过程中会遇到的Spark平台相关的异常信息,以及其解决思路和最终的解决办法,供参考。

 

1,执行训练时,发生java.lang.StackOverflowError错误

这个问题比较简单,查看文档,执行训练时,通过参数指定内存大小可以避免该问题,例如:

pio train  -- --driver-memory 8g --executor-memory 8g --verbose

 

2,执行训练时,发生找不到EmptyRDD方法的错误

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.SparkContext.emptyRDD(Lscala/reflect/ClassTag;)Lorg/apache/spark/rdd/EmptyRDD;
        at com.actionml.URAlgorithm.getRanksRDD(URAlgorithm.scala:534)
        at com.actionml.URAlgorithm.calcAll(URAlgorithm.scala:340)
        at com.actionml.URAlgorithm.train(URAlgorithm.scala:285)
        at com.actionml.URAlgorithm.train(URAlgorithm.scala:175)

这个是编译和执行环境的Spark版本不一致导致的。

Spark2.1.1 ,查看github上的spark源码发现
https://github.com/apache/spark/blob/v2.1.1/core/src/main/scala/org/apache/spark/SparkContext.scala
这个emptyRDD方法,虽然存在
/** Get an RDD that has no partitions or elements. */def emptyRDD[T: ClassTag]: RDD[T] = new EmptyRDD[T](this)
返回值类型和老版本相比,却发生了变化,不是EmptyRDD。所以在1.4.0下编译通过,2.1.1下执行失败。该方法的不同版本产生了不兼容。
如果采用我上一篇备忘录中所记述的方式修改过build.sbt,是可以避免这个问题的。
 
 
3,yarn和spark使用的jersey版本不一致的问题
[INFO] [ServerConnector] Started ServerConnector@bd93bc3{HTTP/1.1}{0.0.0.0:4040}
[INFO] [Server] Started @6428ms
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
        at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:151)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
        at org.apache.predictionio.workflow.WorkflowContext$.apply(WorkflowContext.scala:45)
        at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
        at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250)
        at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 21 more
修改方法:engine.json中的sparkConf中设置
"spark.hadoop.yarn.timeline-service.enabled""false",
 
更深入了解此问题,参考:https://markobigdata.com/2016/08/01/apache-spark-2-0-0-installation-and-configuration/
 
 
4,yarn的空参数处理BUG
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@7772d266{/jobs,null,UNAVAILABLE}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
[WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
        at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
        at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUtil.scala:152)
        at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:775)
        at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:773)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:773)
        at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:867)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:170)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
        at org.apache.predictionio.workflow.WorkflowContext$.apply(WorkflowContext.scala:45)
        at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
        at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250)
        at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
是yarn的一个bug,无法正常处理空参数
https://stackoverflow.com/questions/37284336/spark-job-failing-with-java-lang-arrayindexoutofboundsexception-1
https://issues.apache.org/jira/browse/YARN-3768
 
解决方式:修改spark-env.sh,强制设置一个假参数,可以绕过这个问题
修改 spark/conf/spark-env.sh,增加下面这句话
export SPARK_YARN_USER_ENV="HADOOP_CONF_DIR=/home/hadoop/apache-hadoop/etc/hadoop"

 

5,yarn的软连接BUG
[WARN] [TaskSetManager] Lost task 3.0 in stage 173.0 (TID 250, bigdata01, executor 3): java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one
jar:file:/home/hadoop/apache-hadoop/hadoop/var/yarn/local-dir/usercache/hadoop/appcache/application_1504083960020_0030/container_e235_1504083960020_0030_01_000005/universal-recommender-assembly-0.6.0-deps.jar
jar:file:/home/hadoop/apache-hadoop/hadoop-2.7.2/var/yarn/local-dir/usercache/hadoop/appcache/application_1504083960020_0030/container_e235_1504083960020_0030_01_000005/universal-recommender-assembly-0.6.0-deps.jar

        at org.elasticsearch.hadoop.util.Version.<clinit>(Version.java:73)
        at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:570)
        at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
        at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:107)
        at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:107)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

这不知道算不算一个BUG,总之,yarn的配置中如果使用了软连接来指定hadoop文件夹的路径,将有可能发生此问题。参考 https://interset.zendesk.com/hc/en-us/articles/230751687-PhoenixToElasticSearchJob-Fails-with-Multiple-ES-Hadoop-versions-detected-in-the-classpath-

解决方式也很简单,nodemanager修改所有采用Hadoop文件夹的软连接的配置,改为真正的路径即可。

 

6,Spark的JOB执行出错

[WARN] [Utils] Service "sparkDriver" could not bind on port 0. Attempting port 1.
[ERROR] [SparkContext] Error initializing SparkContext.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service "sparkDriver" failed after 60 retries (starting from 0)! Consider explicitly setting the appropriate port for the service "sparkDriver" (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:437)
        at sun.nio.ch.Net.bind(Net.java:429)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:127)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:501)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1218)
        at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506)
        at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491)
        at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:965)
        at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:210)
        at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:353)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:408)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:455)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:745)
这个错误,网上的有很多文章让修改spark-env.sh ,增加 export SPARK_LOCAL_IP="127.0.0.1"
但这些网文其实只适用于单机SPARK的情况。这个IP是SPARK回调本机的地址,所以应该设置为本机的IP地址(用ifconfig查看本机真实IP)
 

当前文章:http://hnhdqp.com/forum.php?mod=viewthread&tid=28666

发布时间:2019-01-23 12:41:54

大众麻将过关游戏 大众棋牌游戏网页版 荣耀斗牛牛经典游戏 丰城双剑贴吧 朋友一起打麻将的软件 集结号新版3d捕鱼官网 电玩注册送金币可提现 沈阳四冲下载最新版下 七星棋牌游戏赚钱 四川博雅棋牌二七十

编辑:丁帝密

相关新闻

阶梯阅读是差别化的精神滋养

2019-01-23 19:24:01

惠东砸拱伦工程有限公司

市慈善协会筹资122万余元慰问3740户困难家庭

2019-01-23 05:27:41

华南疗室汽车用品有限公司

榆林市福利院儿童免费畅玩游乐园

2019-01-23 19:27:20

贵州倒惩厣文化传媒有限公司

乐视网起诉优酷盗播《三枪拍案惊奇》索赔9.5万元

2019-01-23 01:38:35

慈溪乇罢糜科技有限公司

热门推荐

  • 永辉超市:未来12个月内,腾讯无继续增持计划
  • 如何编写一个稳定的网络程序(TCP)
  • 亚马逊研究员开发出AI选股工具,有助于提升业绩
  • 陕西金寨:秦巴山里盛放的文明之花
  • 守护乡土之魂 松阳探索“九古”旅游发展之路
  • ASP.NET Core 开源GitServer 实现自己的GitHub
  • war3不死,永远热血:魔兽争霸3高清重制或将问世
  • 三、VueJs 填坑日记之项目文件认识
  • 新上海人写上海主题 《上海童话》将被搬上荧屏
  • 【大润发枣庄店】ALESSI|6月竞猜中奖名单公布
  • 河北新闻网版权所有 本站点信息未经允许不得复制或镜像 法律顾问:大连手机打滚子免费 qq游戏大厅下载最新版
  • 棋牌游戏积分换礼品网站 copyright ? 2000 - 2016
  • 新闻热线:0311-67563366 广告热线:0311-67562966 新闻投诉:0311-67562994
  • 冀ICP备 09047539号-1 | 互联网新闻信息服务许可证编号:1312006002
  • 广播电视节目制作经营许可证(冀)字第101号|信息网络传播视听节目许可证0311618号
  • 微笑娱乐棋牌下载 云顶扑克ios怎样下载 紫金作业平台 棋牌手机平台排行榜