通过Docker快速搭建Hadoop测试环境

搭过Hadoop的人都知道,Hadoop的搭建过程非常的繁琐,需要配置大量的环境,修改大量的配置文件,因此搭建一个可用的测试环境非常浪费时间。好在Docker的出现,就是帮助我们解决这类问题,有了Docker我们可以快速搭建一个可用的Hadoop集群供测试使用。

本文使用Github上的一个Dockerfile来实现,做了一些细微的修改来增强国内使用的体验。Github地址

直接clone github的repository,进入repository目录:

以下内容摘自README.md

Apache Hadoop 2.7.1 Docker image


DockerStars
DockerStars

Note: this is the master branch - for a particular Hadoop version always check the related branch

A few weeks ago we released an Apache Hadoop 2.3 Docker image - this quickly become the most popular Hadoop image in the Docker registry.

Following the success of our previous Hadoop Docker images, the feedback and feature requests we received, we aligned with the Hadoop release cycle, so we have released an Apache Hadoop 2.7.1 Docker image - same as the previous version, it's available as a trusted and automated build on the official Docker registry.

FYI: All the former Hadoop releases (2.3, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.5.2, 2.6.0) are available in the GitHub branches or our Docker Registry - check the tags.

适合国内使用的修改

这个版本修改Dockerfile时区为中国区。考虑到中国网络下载下列文件会非常的慢,所以把所有文件全部改为自行提供,而不是通过curl的方式调用,因此需要提供几个文件在当前目录下:

可以分别另寻渠道自行下载

添加docker-compose.yml文件,添加logs映射,快速启动

Build the image

If you'd like to try directly from the Dockerfile you can build the image as:

docker build  -t sequenceiq/hadoop-docker:2.7.1 .

Pull the image

The image is also released as an official Docker image from Docker's automated build repository - you can always pull or refer the image when launching containers.

docker pull sequenceiq/hadoop-docker:2.7.1

通过docker-compose启动

docker-compose up -d

测试环境可用

使用

docker exec -it 容器名称 bash 

进入容器终端

执行下面的命令:

cd $HADOOP_PREFIX
# run the mapreduce
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep input output 'dfs[a-z.]+'

# check the output
bin/hdfs dfs -cat output/*

Hadoop native libraries, build, Bintray, etc

The Hadoop build process is no easy task - requires lots of libraries and their right version, protobuf, etc and takes some time - we have simplified all these, made the build and released a 64b version of Hadoop nativelibs on this Bintray repo. Enjoy.

Automate everything

As we have mentioned previousely, a Docker file was created and released in the official Docker repository

结尾

最后提供几个Hadoop的常用web url:

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 牡丹没开花的时候,只是一片绿色的植物,安静的生长在公园的一角,一点也不起眼,很少有人在此停留,谁能想到这片不起眼的...
    暁寒阅读 326评论 0 4
  • 入党差一点,害怕失望 害怕爸爸失望 害怕见证自己的人缘 害怕与成功失之交臂
    尹小小小可爱阅读 277评论 0 0
  • 最近在跟阁老师学Q版,好稀饭!还没学好,就有朋友发来照片让我画!只好请阁老师帮我设计了线稿!谢谢老师,真的好棒! ...
    清歌浅笑阅读 389评论 13 17
  • 我左手是一杯酒 右手是沙 敬一敬天地 剑光里寒意萧飒 也载你骑马走一回天涯 我左手是一支笔 右手是茶 展一展宣纸 ...
    唯如风_阅读 537评论 5 7