Home | 簡體中文 | 繁體中文 | 雜文 | 打賞(Donations) | ITEYE 博客 | OSChina 博客 | Facebook | Linkedin | 知乎專欄 | Search | Email

138.4. 分散式安裝(CentOS 6 + hadoop-1.1.2)


HDFS:
      NameNode  :管理節點
      DataNode  :數據節點
      SecondaryNamenode : 數據源信息備份整理節點

MapReduce
       JobTracker  :任務管理節點
       Tasktracker :任務運行節點

138.4.1. 準備工作

準備4台伺服器,操作系統為 Centos 6.4 最小化安裝


NameNode   192.168.2.10 hostname namenode
DataNode    192.168.2.11 hostname:datanode1
DataNode    192.168.2.12 hostname:datanode2

JobTracker  192.168.2.10 (也可單獨配置一台,也可以與NameNode公用,這裡只用到了HDFS,這台可有可無,準備上面4台即可)
TaskTracker (與DataNode共用)

設置網絡使其可以互訪,然後關閉防火牆與selinux

# yum update -y
# lokkit --disabled --selinux=disabled
			

Hadoop 重要的連接埠


1.Job Tracker 管理界面: 50030
2.HDFS 管理界面 :  50070
3.HDFS通信連接埠:  9000
4.MapReduce通信連接埠:  9001

過程 138.3. Hadoop - 準備工作

  1. 為所有伺服器安裝Java運行環境

    以 CentOS 6.4 為例

    # yum install java-1.7.0-openjdk
    					
  2. 在所有伺服器上安裝 Hadoop

    安裝方案有下面兩種 RPM與YUM,選擇其中一種

    # rpm -ivh http://ftp.cuhk.edu.hk/pub/packages/apache.org/hadoop/common/hadoop-1.1.2/hadoop-1.1.2-1.x86_64.rpm
    Retrieving http://ftp.cuhk.edu.hk/pub/packages/apache.org/hadoop/common/hadoop-1.1.2/hadoop-1.1.2-1.x86_64.rpm
    Preparing...                ########################################### [100%]
       1:hadoop                 ########################################### [100%]
    					
    yum localinstall http://ftp.cuhk.edu.hk/pub/packages/apache.org/hadoop/common/hadoop-1.1.2/hadoop-1.1.2-1.x86_64.rpm
    					

    如果網絡比較慢,可以使用Wget或axel下載後安裝

    wget http://ftp.cuhk.edu.hk/pub/packages/apache.org/hadoop/common/hadoop-1.1.2/hadoop-1.1.2-1.x86_64.rpm
    yum localinstall hadoop-1.1.2-1.x86_64.rpm
    					

    Hadoop 用戶

    # cat /etc/passwd | grep Hadoop
    mapred:x:202:123:Hadoop MapReduce:/tmp:/bin/bash
    hdfs:x:201:123:Hadoop HDFS:/tmp:/bin/bash
    					
  3. 配置/etc/hosts檔案

    					
    cat >> /etc/hosts <<EOD
    
    ###############################
    # Hadoop Host
    ###############################
    #NameNode
    192.168.2.10 	namenode.example.com
    
    #DataNode
    192.168.2.11 	datanode1.example.com
    192.168.2.12 	datanode2.example.com
    
    EOD
    					
    					
  4. 生成其密鑰

    					
    # ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/root/.ssh/id_rsa):
    Enter passphrase (empty for no passphrase):
    Enter same passphrase again:
    Your identification has been saved in /root/.ssh/id_rsa.
    Your public key has been saved in /root/.ssh/id_rsa.pub.
    The key fingerprint is:
    cc:6f:30:76:82:28:96:13:c8:e6:bc:d7:5b:2d:11:d7 root@images-upload
    The key's randomart image is:
    +--[ RSA 2048]----+
    |                 |
    |..        .      |
    |.o.    . . E     |
    |+  o . +o        |
    | o= . ..S .      |
    | ..o.  .o*       |
    | . . . o .o      |
    |  .   o ..       |
    |     .           |
    +-----------------+
    					
    					
  5. 植入公鑰證書

    向DataNode節點所有的伺服器植入公鑰證書

    					
    ssh-copy-id root@datanode1.example.com
    ssh-copy-id root@datanode2.example.com
    					
    					

    只需要輸入yes後,再輸入密碼即可完成公鑰證書的植入。過程類似下面:

    # ssh-copy-id root@datanode1.example.com
    The authenticity of host 'datanode1.example.com (192.168.2.11)' can't be established.
    RSA key fingerprint is f1:0b:b1:63:1a:f6:ac:a3:da:4f:14:b5:f0:cc:df:67.
    Are you sure you want to continue connecting (yes/no)? yes 輸入yes
    Warning: Permanently added 'datanode1.example.com' (RSA) to the list of known hosts.
    root@datanode1.example.com's password: 輸入密碼
    Now try logging into the machine, with "ssh 'root@datanode1.example.com'", and check in:
    
      .ssh/authorized_keys
    
    to make sure we haven't added extra keys that you weren't expecting.
    
    # ssh-copy-id root@datanode2.example.com
    The authenticity of host 'datanode2.example.com (192.168.2.12)' can't be established.
    RSA key fingerprint is f1:0b:b1:63:1a:f6:ac:a3:da:4f:14:b5:f0:cc:df:67.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'datanode2.example.com,192.168.2.12' (RSA) to the list of known hosts.
    root@datanode2.example.com's password:
    Now try logging into the machine, with "ssh 'root@datanode2.example.com'", and check in:
    
      .ssh/authorized_keys
    
    to make sure we haven't added extra keys that you weren't expecting.
    
    					

    完成後測試登陸,如果沒有提示密碼直接進入表示正確

    # ssh root@datanode1.example.com
    # exit
    					

138.4.2. NameNode 配置名稱節點

配置檔案

core-site.xml	 common屬性配置
hdfs-site.xml    HDFS屬性配置
mapred-site.xml  MapReduce屬性配置
hadoop-env.sh    hadooop 環境變數配置
			

過程 138.4. Hadoop - NameNode

  1. 配置檔案 hadoop-env.sh

    將 /usr/java/default 改為 /usr

    # cp hadoop-env.sh hadoop-env.sh.original
    # sed -i "s:/usr/java/default:/usr:" hadoop-env.sh
    					
  2. 配置檔案 core-site.xml

    					
    # cp core-site.xml core-site.xml.original
    
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
             <name>fs.default.name</name>
             <value>hdfs://namenode.example.com:9000</value>
        </property>
        <property>
             <name>hadoop.tmp.dir</name>
             <value>/var/tmp/hadoop</value>
        </property>
    </configuration>
    					
    					

    fs.default.name: NameNode的URI。hdfs://主機名:連接埠/

    hadoop.tmp.dir: Hadoop的預設臨時路徑,

  3. 配置檔案 mapred-site.xml

    					
    # cp mapred-site.xml mapred-site.xml.original
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>mapred.job.tracker</name>
            <value>namenode.example.com:9001</value>
        </property>
        <property>
            <name>mapred.local.dir</name>
            <value>/var/tmp/hadoop</value>
        </property>
    </configuration>
    					
    					

    mapred.job.tracker: JobTracker的主機和連接埠。

  4. 配置檔案 hdfs-site.xml

    					
    # cp hdfs-site.xml hdfs-site.xml.original
    
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>dfs.name.dir</name>
            <value>/var/hadoop/name1</value>
            <description>  </description>
        </property>
        <property>
            <name>dfs.data.dir</name>
            <value>/var/hadoop/hdfs/data1</value>
            <description> </description>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>2</value>
        </property>
    </configuration>
    					
    					
    dfs.name.dir: NameNode持久存儲名字空間及事務日誌的本地檔案系統路徑。 當這個值是一個逗號分割的目錄列表時,nametable數據將會被覆制到所有目錄中做冗餘備份。
    2)   dfs.data.dir是DataNode存放塊數據的本地檔案系統路徑,逗號分割的列表。 當這個值是逗號分割的目錄列表時,數據將被存儲在所有目錄下,通常分佈在不同設備上。
    3)dfs.replication是數據需要備份的數量,預設是3,如果此數大於集群的機器數會出錯。
    					
  5. 配置masters和slaves主從結點

    備份masters與slaves配置檔案

     cp masters masters.original
     cp slaves slaves.original
    					
    					
    cat > /etc/hadoop/masters <<EOD
    namenode.example.com
    EOD
    					
    					
    					
    cat > /etc/hadoop/slaves <<EOD
    datanode1.example.com
    datanode2.example.com
    EOD
    					
    					
  6. 複製配置檔案

    # cd /etc/hadoop/
    # scp hadoop-env.sh core-site.xml mapred-site.xml hdfs-site.xml masters slaves root@datanode1.example.com:/etc/hadoop/
    # scp hadoop-env.sh core-site.xml mapred-site.xml hdfs-site.xml masters slaves root@datanode2.example.com:/etc/hadoop/
    					

    控制台輸出類似下面表示覆製成功。

    # scp hadoop-env.sh core-site.xml mapred-site.xml hdfs-site.xml masters slaves root@datanode1.example.com:/etc/hadoop/
    hadoop-env.sh                                                                          100% 2116     2.1KB/s   00:00
    core-site.xml                                                                          100%  412     0.4KB/s   00:00
    mapred-site.xml                                                                        100%  406     0.4KB/s   00:00
    hdfs-site.xml                                                                          100%  595     0.6KB/s   00:00
    masters                                                                                100%   21     0.0KB/s   00:00
    slaves
    					

    將 NameNode 上的配置檔案複製給 DataNode

  7. 啟動 Hadoop

    創建工作目錄

    # mkdir /var/hadoop/
    # mkdir /var/hadoop/name{1,2}
    # su - hdfs -c  "mkdir -p  /var/hadoop/hdfs/data{1,2}"
    					
    # hadoop namenode -format
    13/04/23 14:35:33 INFO namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = namenode.example.com/192.168.2.10
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 1.1.2
    STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonfo' on Thu Jan 31 02:06:43 UTC 2013
    ************************************************************/
    Re-format filesystem in /var/hadoop/name1 ? (Y or N) Y
    13/04/23 14:35:37 INFO util.GSet: VM type       = 64-bit
    13/04/23 14:35:37 INFO util.GSet: 2% max memory = 2.475 MB
    13/04/23 14:35:37 INFO util.GSet: capacity      = 2^18 = 262144 entries
    13/04/23 14:35:37 INFO util.GSet: recommended=262144, actual=262144
    13/04/23 14:35:37 INFO namenode.FSNamesystem: fsOwner=root
    13/04/23 14:35:37 INFO namenode.FSNamesystem: supergroup=supergroup
    13/04/23 14:35:37 INFO namenode.FSNamesystem: isPermissionEnabled=true
    13/04/23 14:35:37 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
    13/04/23 14:35:37 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
    13/04/23 14:35:38 INFO namenode.NameNode: Caching file names occuring more than 10 times
    13/04/23 14:35:38 INFO common.Storage: Image file of size 110 saved in 0 seconds.
    13/04/23 14:35:38 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/var/hadoop/name1/current/edits
    13/04/23 14:35:38 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/var/hadoop/name1/current/edits
    13/04/23 14:35:38 INFO common.Storage: Storage directory /var/hadoop/name1 has been successfully formatted.
    13/04/23 14:35:38 INFO common.Storage: Image file of size 110 saved in 0 seconds.
    13/04/23 14:35:38 INFO namenode.FSEditLog: closing edit log: position=4, editlog= /var/hadoop/name2/current/edits
    13/04/23 14:35:38 INFO namenode.FSEditLog: close success: truncate to 4, editlog= /var/hadoop/name2/current/edits
    13/04/23 14:35:38 INFO common.Storage: Storage directory  /var/hadoop/name2 has been successfully formatted.
    13/04/23 14:35:38 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at namenode.example.com/192.168.2.10
    ************************************************************/
    					
    # chown hdfs:hadoop -R /var/hadoop
    					
    # /etc/init.d/hadoop-namenode start
    # /etc/init.d/hadoop-datanode start
    					

http://192.168.2.10:50070/

138.4.3. DataNode 配置數據節點

過程 138.5. Hadoop - DataNode

  1. 創建hadoop數據存儲目錄

    					
    mkdir /var/hadoop/
    chown hdfs:hadoop -R /var/hadoop
    su - hdfs -c  "mkdir -p  /var/hadoop/hdfs/data1"
    					
    					
  2. 啟動 Hadoop

    					
    # /etc/init.d/hadoop-datanode start
    					
    					

138.4.4. Hadoop UI (WEB界面)

常用訪問頁面


1. HDFS 界面
        http://hostname:50070
2. MapReduce 管理界面
        http://hostname:50030
        

138.4.5. 測試Hadoop

將install.log檔案拷貝到分散式檔案系統

hadoop fs -mkdir test
hadoop fs -put install.log test
			

顯示檔案內容

# hadoop dfs -cat test/install.log
			

查看目錄結構

# hadoop dfs -ls
Found 1 items
drwxr-xr-x   - root supergroup          0 2013-04-23 15:20 /user/root/test
[root@namenode ~]# hadoop dfs -ls test
Found 1 items
-rw-r--r--   2 root supergroup      10278 2013-04-23 15:20 /user/root/test/install.log