文档编写目的:记录备忘
1.环境搭建
4台centOS 7的主机,分别安装了四个ClickHouse数据库,安装参考://www.greatytc.com/p/2ce45a9c30ce
2.新增配置文件metrika.xml
vim /etc/clickhouse-server/metrika.xml
<yandex>
<clickhouse_remote_servers>
<!--集群名称 自定义 -->
<my_cluster>
<!-- 分片1 -->
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>192.168.120.87</host>
<port>9000</port>
<user>sunny</user>
<password>sunny</password>
</replica>
</shard>
<!-- 分片2 -->
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>192.168.120.103</host>
<port>9000</port>
<user>sunny</user>
<password>sunny</password>
</replica>
</shard>
<!-- 分片3 -->
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>192.168.120.105</host>
<port>9000</port>
<user>sunny</user>
<password>sunny</password>
</replica>
</shard>
<!--分片4-->
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>192.168.120.140</host>
<port>9000</port>
<user>sunny</user>
<password>sunny</password>
</replica>
</shard>
</my_cluster>
</clickhouse_remote_servers>
<!-- 副本名称 -->
<macros>
<shard>node01</shard>
<replica>192.168.120.87</replica>
</macros>
<networks>
<ip>::/0</ip>
</networks>
<!-- 可不配置 -->
<zookeeper-servers>
<node index="1">
<host>192.168.120.87</host>
<port>2181</port>
</node>
<node index="2">
<host>192.168.120.103</host>
<port>2181</port>
</node>
<node index="3">
<host>192.168.120.105</host>
<port>2181</port>
</node>
</zookeeper-servers>
<!-- 数据压缩算法 -->
<clickhouse_compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method>
</case>
</clickhouse_compression>
</yandex>
通过scp将metrika.xml文件发送到其他三台服务器上
scp /etc/clickhouse-server/metrika.xml root@192.168.120.103:/etc/clickhouse-server/metrika.xml
然后修改其他三台服务器上的metrika.xml,仅修改图中红框代码,互不重复,其他都不变:

image.png
然后编辑4个服务器上的config.xml,引入metrika.xml:
vim /etc/clickhouse-server/config.xml
加上下面这句:
<include_from>/etc/clickhouse-server/metrika.xml</include_from>

image.png
重启服务:
service clickhouse-server restart
3.测试
此种集群只适合表引擎为MergeTree的表,不适合ReplicatedMergeTree,在每个服务器数据库创建如下表:
-- 本地表
create table user_local
(
id Int32,
user_code String,
user_name String,
createDate Date
) engine = MergeTree(createDate,(id,user_code),8192);
--分布式表
create table user_all as user_local ENGINE = Distributed(my_cluster, ClickHouseTest, user_local, rand());
插入数据:
insert into user_all (id, user_code, user_name, createDate) values (1, 1001, '张三', '2020-01-01');
insert into user_all (id, user_code, user_name, createDate) values (2, 1002, '张三', '2020-01-02');
insert into user_all (id, user_code, user_name, createDate) values (3, 1003, '张三', '2020-01-03');
insert into user_all (id, user_code, user_name, createDate) values (4, 1004, '张三', '2020-01-04');
insert into user_all (id, user_code, user_name, createDate) values (5, 1005, '张三', '2020-01-05');
insert into user_all (id, user_code, user_name, createDate) values (6, 1006, '张三', '2020-01-06');
insert into user_all (id, user_code, user_name, createDate) values (7, 1007, '张三', '2020-01-07');
insert into user_all (id, user_code, user_name, createDate) values (8, 1008, '张三', '2020-01-08');
insert into user_all (id, user_code, user_name, createDate) values (9, 1009, '张三', '2020-01-09');
insert into user_all (id, user_code, user_name, createDate) values (10, 1010, '张三', '2020-01-10');
insert into user_all (id, user_code, user_name, createDate) values (11, 1011, '张三', '2020-01-11');
insert into user_all (id, user_code, user_name, createDate) values (12, 1012, '张三', '2020-01-12');
insert into user_all (id, user_code, user_name, createDate) values (13, 1013, '张三', '2020-01-13');
insert into user_all (id, user_code, user_name, createDate) values (14, 1014, '张三', '2020-01-14');
insert into user_all (id, user_code, user_name, createDate) values (15, 1015, '张三', '2020-01-15');
查询某个服务器的user_local和user_all
查询结果如下:
user_local

image.png
user_all

image.png
总结:user_local仅展示本服务器上的存在的数据,user_all查询集群中所有user_local表的表数据之和。此种方式为多分片单副本的集群方式,仅通过clickhouse的配置文件即可实现,不需要zookeeper,通过user_all表插入数据时会将插入的数据随机的插入至集群中的user_local表中,查询时将集群中所有user_local表中符合条件的数据查询出来,再汇总在一起。此种集群方式无法实现高可用,若集群中有一个服务器挂掉,集群中的其他节点也无法在提供数据查询和存储。
