Home | Mirror | Search

7. Hadoop - HDFS

http://hadoop.apache.org/

java

	
$ sudo apt-get install openjdk-6-jre-headless	

$ sudo vim /etc/profile.d/java.sh
################################################
### Java environment by neo
################################################
export JAVA_HOME=/usr
export JRE_HOME=/usr
export PATH=$PATH:/usr/local/apache-tomcat/bin/:/usr/local/jetty-6.1.18/bin:/usr/local/apache-nutch/bin
export CLASSPATH="./:/usr/share/java/:/usr/local/apache-solr/example/multicore/lib"
export JAVA_OPTS="-Xms128m -Xmx1024m"
	
	

過程 6.1. Master configure

  1. Download and Installing Software

    			
    $ cd /usr/local/src/			
    $ wget http://apache.etoak.com/hadoop/core/hadoop-0.20.0/hadoop-0.20.0.tar.gz
    $ tar zxvf hadoop-0.20.0.tar.gz
    $ sudo cp -r hadoop-0.20.0 ..
    $ sudo ln -s hadoop-0.20.0 hadoop
    $ cd hadoop
    			
    			
  2. Configuration

    hadoop-env.sh

    			
    $ vim conf/hadoop-env.sh
    export JAVA_HOME=/usr
    			
    			

    conf/core-site.xml

    			
    $ vim conf/core-site.xml
    
    <configuration>
      <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
      </property>
    </configuration>
    			
    			

    conf/hdfs-site.xml

    			
    $ vim conf/hdfs-site.xml
    
    <configuration>
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property>
    </configuration>
    			
    			

    conf/mapred-site.xml

    			
    $ vim conf/mapred-site.xml
    
    <configuration>
      <property>
        <name>mapred.job.tracker</name>
        <value>localhost:9001</value>
      </property>
    </configuration>
    			
    			
  3. Setup passphraseless ssh

    			
    Now check that you can ssh to the localhost without a passphrase:
    $ ssh localhost
    
    If you cannot ssh to localhost without a passphrase, execute the following commands:
    $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys 			
    			
    			
  4. Execution

     Format a new distributed-filesystem:
    $ bin/hadoop namenode -format
    
    Start the hadoop daemons:
    $ bin/start-all.sh 
    
    When you're done, stop the daemons with:
    $ bin/stop-all.sh			
    			
  5. Monitor

    Browse the web interface for the NameNode and the JobTracker; by default they are available at:

    • NameNode - http://localhost:50070/

    • JobTracker - http://localhost:50030/

  6. Test

    			
    $ bin/hadoop dfs -mkdir test
    $ echo helloworld > testfile
    $ bin/hadoop dfs -copyFromLocal testfile test/
    $ bin/hadoop dfs -ls
    Found 1 items
    drwxr-xr-x   - neo supergroup          0 2009-07-10 14:18 /user/neo/test
    
    $ bin/hadoop dfs -ls test
    
    $ bin/hadoop dfs –cat test/file
    			
    			

過程 6.2. slave config

  1. SSH

    			
    $ scp neo@master:~/.ssh/id_dsa.pub .ssh/master.pub
    $ cat .ssh/master.pub >> .ssh/authorized_keys			
    			
    			
  2. Hadoop

    			
    $ scp neo@master:/usr/local/hadoop /usr/local/hadoop
    			
    			
comments powered by Disqus