clickhouse 配置过程记录
参考资料
ubuntu 18.04 ×64 clickhouse安装流程
-
检查当前CPU是否支持SSE 4.2
$ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
- 检测后支持,继续下一步
-
执行以下命令开始安装
sudo apt-get install apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4
echo "deb https://repo.clickhouse.tech/deb/stable/ main/" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-server clickhouse-client
由于安装时间超过3小时,暂停安装,改采用下载deb包安装。
-
下载deb安装包
- 下载3个文件 clickhouse-client,clickhouse-server,clickhouse-common-static
-
执行安装命令
sudo dpkg -i clickhouse-common-static_20.6.3.28_amd64.deb
sudo dpkg -i clickhouse-client_20.6.3.28_all.deb
sudo dpkg -i clickhouse-server_20.6.3.28_all.deb
-
启动
sudo service clickhouse-server start
clickhouse-client
-
启动失败,错误信息如下
Code: 516. DB::Exception: Received from localhost:9000. DB::Exception:default: Authentication failed: password is incorrect or there is no user with such name.
-
重新执行命令
clickhouse-client --password
-
启动成功,输出语句
ClickHouse client version 20.6.3.28 (official build).
Password for user (default):
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.6.3 revision 54436. -
创建数据库 BITest
CREATE DATABASE IF NOT EXISTS BITEST
-
创建TAXI_TRIPS 表
CREATE TABLE BITEST.TAXI_TRIPS
(
Trip ID
String,
Taxi ID
String,
Trip Start Timestamp
DateTime,
Trip End Timestamp
Nullable(DateTime),
Trip Seconds
Nullable(UInt64),
Trip Miles
Nullable(Float64),
Pickup Census Tract
Nullable(UInt64),
Dropoff Census Tract
Nullable(UInt64),
Pickup Community Area
Nullable(UInt64),
Dropoff Community Area
Nullable(UInt64),
Fare
Nullable(String),
Tips
Nullable(String),
Tolls
Nullable(String),
Extras
Nullable(String),
Trip Total
Nullable(String),
Payment Type
Nullable(String),
Company
Nullable(String),
Pickup Centroid Latitude
Nullable(Float64),
Pickup Centroid Longitude
Nullable(Float64),
Pickup Centroid Location
Nullable(String),
Dropoff Centroid Latitude
Nullable(Float64),
Dropoff Centroid Longitude
Nullable(Float64),
Dropoff Centroid Location
Nullable(String),
Community Areas
Nullable(UInt64)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(toDate(Trip Start Timestamp
))
ORDER BY Trip ID
SETTINGS index_granularity = 8192;
11 退出client ,执行导入数据的命令行
- clickhouse-client --password --query='INSERT INTO BITEST.TAXI_TRIPS FORMAT CSVWithNames' < Taxi_Trips.csv --date_time_input_format='best_effort' --max_insert_block_size=1024
-
执行报错,显示数据导入值不对,报错位置在13572634行。打开文件后发现最后五行数据值不合理。删除最后五行。
- err.png
- errordata.png
-
删除表中上一次导入的数据,重新导入csv文件,然后检证数据条数。
- [图片上传失败...(image-d482ca-1598519252266)]
- (在执行第三个SQL时发现问题)在把字段Trip Start Timestamp作为DateTime类型导入时,凌晨的“00:00:00~00:00:59"的数据全部读入错误,导致计算时一天只有1~23小时的时间段。
- err_output3.png
- 重新建表,数据引擎选为Log,用string类型读入时间。
CREATE TABLE BITEST.TAXI_TRIPS
(
Trip ID
String,
Taxi ID
String,
Trip Start Timestamp
String,
Trip End Timestamp
Nullable(String),
Trip Seconds
Nullable(UInt64),
Trip Miles
Nullable(Float64),
Pickup Census Tract
Nullable(UInt64),
Dropoff Census Tract
Nullable(UInt64),
Pickup Community Area
Nullable(UInt64),
Dropoff Community Area
Nullable(UInt64),
Fare
Nullable(String),
Tips
Nullable(String),
Tolls
Nullable(String),
Extras
Nullable(String),
Trip Total
Nullable(String),
Payment Type
Nullable(String),
Company
Nullable(String),
Pickup Centroid Latitude
Nullable(Float64),
Pickup Centroid Longitude
Nullable(Float64),
Pickup Centroid Location
Nullable(String),
Dropoff Centroid Latitude
Nullable(Float64),
Dropoff Centroid Longitude
Nullable(Float64),
Dropoff Centroid Location
Nullable(String),
Community Areas
Nullable(UInt64)
)
ENGINE = Log()
16.再次执行导入数据命令行
- clickhouse-client --password --query='INSERT INTO BITest1.TAXI_TRIPS FORMAT CSVWithNames' < Taxi_Trips.csv --max_insert_block_size=1024
17.再次检证条数和Trip Start Timestamp`字段,读入成功。
报表分析
- 问题一: ⾏⻋⾥程最多的出租⻋Top 100
SELECT sum(
Trip Miles
) AS Miles ,Taxi ID
AS ID
FROM BITEST.TAXI_TRIPS
GROUP BYTaxi ID
order by Miles DESC
limit 100
- output1.png
-
问题二:搭载乘客最多的出租车Top 100
- 提供的数据中并没乘客数量字段,改为计算最多的出租车
SELECT count(1) as num ,
Taxi ID
AS ID
FROM BITEST.TAXI_TRIPS
GROUP BYTaxi ID
order by num DESC
limit 100
-output2.png -
问题三:分析出租⻋每⽇⾼峰期和低峰期 (⼩时级别)
SELECT
AVG(t1.num) AS avg_num,
t1.hour
FROM
(
SELECT
substring(Trip Start Timestamp
, 1, 10) AS date,
if(substring(Trip Start Timestamp
, 21, 2) = 'AM', concat(substring(Trip Start Timestamp
, 12, 2), 'AM'), concat(substring(Trip Start Timestamp
, 12, 2), 'PM')) AS hour,
count(Taxi ID
) AS num
FROM BITEST.TAXI_TRIPS
GROUP BY
date,
hour
) AS t1
GROUP BY t1.hour
ORDER BY avg_num ASC
-output3.png -
问题四:分析出出租⻋平均搭载时⻓。
SELECT
Taxi ID
AS ID,
avg(Trip Seconds
) AS avg_time
FROM BITEST.TAXI_TRIPS
GROUP BYTaxi ID
ORDER BY avg_time DESC
LIMIT 1000
-output4.png