目錄
Hive是基于Hadoop構建的一套數據倉庫分析系統,它提供了豐富的SQL查詢方式來分析存儲在Hadoop 分散式檔案系統中的數據。其在Hadoop的架構體系中承擔了一個SQL解析的過程,它提供了對外的入口來獲取用戶的指令然後對指令進行分析,解析出一個MapReduce程序組成可執行計劃,並按照該計劃生成對應的MapReduce任務提交給Hadoop集群處理,獲取最終的結果。
安裝 Apache Hive 需要 Hadoop和MySQL,這裡假設你已經懂得如何安裝Hadoop和MySQL,所以一下將採用Netkiller OSCM一件安裝腳本來初始化Hadoop和MySQL,如果需要詳細的安裝步驟請參考筆者的相關文章。
預設情況下, Hive 使用內嵌的 Derby 資料庫保存元數據, 通常生產環境會使用 MySQL 來存放 Hive 元數據。
使用下面腳本一鍵安裝MySQL 5.7 安裝後會顯示mysql的初始密碼,是所有初始密碼登陸後修改為你的需要密碼
curl -s https://raw.githubusercontent.com/oscm/shell/master/database/mysql/5.7/mysql.server.sh | bash 2016-02-16T08:22:58.253030Z 1 [Note] A temporary password is generated for root@localhost: sd%%my.Ak7Ma
安裝 MySQL JDBC 連接庫。
curl -s https://raw.githubusercontent.com/oscm/shell/master/database/mysql/5.7/mysql-connector-java.sh | bash
創建一個 hive 資料庫用來存儲 Hive 元數據,且資料庫訪問的用戶名和密碼都為 hive。
mysql> CREATE DATABASE hive; Query OK, 1 row affected (0.03 sec)
創建用戶hive並授權訪問hive資料庫
mysql> CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive'; Query OK, 0 rows affected (0.04 sec) mysql> GRANT ALL ON hive.* TO 'hive'@'localhost' IDENTIFIED BY 'hive'; Query OK, 0 rows affected (0.01 sec) mysql> GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'hive'; Query OK, 0 rows affected (0.00 sec) mysql> FLUSH PRIVILEGES; Query OK, 0 rows affected (0.00 sec) mysql> quit; Bye
安裝 Hadoop 採用單機模式
curl -s https://raw.githubusercontent.com/oscm/shell/master/distributed/hadoop/hadoop-2.8.0.sh | bash curl -s https://raw.githubusercontent.com/oscm/shell/master/distributed/hadoop/single.sh | bash curl -s https://raw.githubusercontent.com/oscm/shell/master/distributed/hadoop/startup.sh | bash
可以從 Apache 鏡像站點中下載最新穩定版的 apache-hive-2.1.1-bin.tar.gz
cd /usr/local/src wget http://mirrors.hust.edu.cn/apache/hive/stable-2/apache-hive-2.1.1-bin.tar.gz tar zxf apache-hive-2.1.1-bin.tar.gz mv apache-hive-2.1.1-bin /srv/apache-hive-2.1.1 ln -s /srv/apache-hive-2.1.1/ /srv/apache-hive chown hadoop:hadoop -R /srv/apache-hive-2.1.1
cat > /srv/apache-hive/conf/hive-env.sh <<'EOF' export JAVA_HOME=/srv/java export HADOOP_HOME=/srv/apache-hadoop export HBASE_HOME=/srv/apache-hbase export HIVE_HOME=/srv/apache-hive export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin EOF cat >> ~/.bash_profile <<'EOF' export JAVA_HOME=/srv/java export HADOOP_HOME=/srv/apache-hadoop export HBASE_HOME=/srv/apache-hbase export HIVE_HOME=/srv/apache-hive export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin EOF source ~/.bash_profile
安裝JDBC驅動
[root@localhost apache-hive]# ln -s /usr/share/java/mysql-connector-java.jar /srv/apache-hive/lib/ [root@localhost apache-hive]# ll /srv/apache-hive/lib/mysql-connector-java.jar lrwxrwxrwx 1 root root 40 Jun 29 01:59 /srv/apache-hive/lib/mysql-connector-java.jar -> /usr/share/java/mysql-connector-java.jar
修改 hive-site.xml 配置檔案,配置工作目錄
<property> <name>system:java.io.tmpdir</name> <value>/tmp/hive</value> </property> <property> <name>system:user.name</name> <value>hadoop</value> </property> <property> <name>hive.querylog.location</name> <value>/tmp/live/hadoop</value> <description>Location of Hive run time structured log file</description> </property> <property> <name>hive.exec.local.scratchdir</name> <value>/tmp/hive</value> <description>Local scratch space for Hive jobs</description> </property> <property> <name>hive.downloaded.resources.dir</name> <value>/tmp/hive/${hive.session.id}_resources</value> <description>Temporary local directory for added resources in the remote file system.</description> </property> <property> <name>hive.querylog.location</name> <value>/user/hive/log</value> <description>Location of Hive run time structured log file</description> </property>
把預設的 Derby 修改為 MySQL 需要在該檔案中配置 MySQL 資料庫連接信息。
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:derby:;databaseName=metastore_db;create=true</value> <description> JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. </description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>org.apache.derby.jdbc.EmbeddedDriver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>APP</value> <description>Username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>mine</value> <description>password to use against metastore database</description> </property>
將上面配置項 value 改為下面的配置
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> </property>
啟動 Hive 前你必須做兩件事,一是創建HDFS目錄,二是初始化 MySQL 資料庫。
為 Hive 創建 HDFS 工作目錄並給它們賦相應的權限。
[root@localhost ~]$ su - hadoop [hadoop@localhost ~]$ /srv/apache-hadoop/bin/hdfs dfs -mkdir -p /user/hive/warehouse [hadoop@localhost ~]$ /srv/apache-hadoop/bin/hdfs dfs -mkdir -p /tmp/hive [hadoop@localhost ~]$ /srv/apache-hadoop/bin/hdfs dfs -chmod g+w /user/hive/warehouse [hadoop@localhost ~]$ /srv/apache-hadoop/bin/hdfs dfs -chmod 777 /tmp/hive
初始化 MySQL 資料庫
[hadoop@localhost ~]$ /srv/apache-hive/bin/schematool -dbType mysql -initSchema SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/srv/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/srv/apache-hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL: jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: hive Starting metastore schema initialization to 2.1.0 Initialization script hive-schema-2.1.0.mysql.sql Initialization script completed schemaTool completed
如果使用內嵌資料庫 derby 請使用下面命令初始化
schematool -initSchema -dbType derby
啟動 Hadoop
[hadoop@localhost ~]$ /srv/apache-hadoop/sbin/start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [localhost] localhost: starting namenode, logging to /srv/apache-hadoop-2.8.0/logs/hadoop-hadoop-namenode-localhost.localdomain.out localhost: starting datanode, logging to /srv/apache-hadoop-2.8.0/logs/hadoop-hadoop-datanode-localhost.localdomain.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /srv/apache-hadoop-2.8.0/logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out starting yarn daemons starting resourcemanager, logging to /srv/apache-hadoop-2.8.0/logs/yarn-hadoop-resourcemanager-localhost.localdomain.out localhost: starting nodemanager, logging to /srv/apache-hadoop-2.8.0/logs/yarn-hadoop-nodemanager-localhost.localdomain.out
進入 Hive 然後輸入 show databases; 測試安裝是否正常。
[hadoop@localhost conf]$ /srv/apache-hive/bin/hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/srv/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/srv/apache-hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in file:/srv/apache-hive-2.1.1/conf/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> show databases; OK default Time taken: 0.264 seconds, Fetched: 1 row(s) hive>
至此 Apache Hive 已經安裝配置完成!
嘗試執行下面的SQL命令測試Hive是否正常
create database test; use test; create table test_table (id int,name string) row format delimited fields terminated by ',' stored as textfile insert into test_table values (1, 'Neo'); select * from test_table;
hiveserver2 提供遠程訪問 Hive 服務,用戶可以通過IP地址和連接埠號連接到Hive,類似mysql client
[hadoop@localhost ~]$ /srv/apache-hive/bin/hiveserver2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/srv/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/srv/apache-hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
檢查連接埠
[hadoop@localhost bin]$ ss -lnt | grep 10000 LISTEN 0 50 *:10000 *:*
測試 beeline 是否可以正常進入
[hadoop@localhost ~]$ /srv/apache-hive/bin/beeline -u jdbc:hive2:// SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/srv/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/srv/apache-hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2:// 17/06/29 22:01:16 [main]: WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. Connected to: Apache Hive (version 2.1.1) Driver: Hive JDBC (version 2.1.1) 17/06/29 22:01:16 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false. Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 2.1.1 by Apache Hive 0: jdbc:hive2://> show databases; OK +----------------+--+ | database_name | +----------------+--+ | default | +----------------+--+ 1 row selected (1.318 seconds)
如果是生產環境啟動請使用下面的方法
[hadoop@localhost ~]$ /srv/apache-hive/bin/hive --service hiveserver2 & [1] 20375 [hadoop@localhost ~]$ SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/srv/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/srv/apache-hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]