I have already shown you how to use PIG from the Ambari web interface. Now I will show you how to use GRUNT to run PIG Scripts from the command line.
First things first. Before you can use the command terminal, you need to log into Hadoop via the VMware terminal. Click on your running VM and hit Alt-F5
Log in using root and hadoop as your password. You will then be asked to create a new password.
Now go to http://127.0.0.1:8888 and click View Advanced Options
Here you will see instructions for accessing Hadoop via SSH Client (command line access).
Also, in case someone out there doesn’t know. localhost and 127.0.0.1 mean the same thing and are interchangeable.
On of the best ways to access SSH from Windows is through a free program called Putty. You can find a link for it here:putty
Once you have it downloaded, click on the exe file and fill in the information as seen below. IP or Hostname(127.0.0.1 or localhost) and Port 2222. Make sure SSH is checked. If you want, you can save these setting like I did. I named mine HadoopLearning.
Next, click Open
Log in using root and your new password
Okay, now for a quick explanation of what will seem confusing to some. You are currently logged into a Linux computer (CentOS to be exact) that came prepacked with Hadoop already installed. So, in order to interact with Hadoop we need to use one of the two following commands first (hdfs dfs or hadoop fs) – either one works. It is just a matter of personal choice. I like hdfs dfs, but feel free to use hadoop fs
hdfs dfs -ls /
This command gives me the listing (ls) of the root hadoop folder.
I know my files are in user/maria_dev so let’s look in there now.
hdfs dfs -ls /user/maria_dev
You can see I have some csv files in this folder we can work with.
Now we are in a working terminal and have data, the next step is to start PIG. Now pay close attention, this part is very difficult. Go to your command terminal and type: pig
You will see the terminal working and you should end up with a prompt that says grunt>
uP = LOAD '/user/maria_dev/UltrasoundPrice.csv' USING PigStorage(',') as (Model, Price); dump uP;
You will see the MapReduce job running.
And here are your results.