PIG: Use GRUNT to Access PIG from the Command Line

I have already shown you how to use PIG from the Ambari web interface. Now I will show you how to use GRUNT to run PIG Scripts from the command line.

First things first. Before you can use the command terminal, you need to log into Hadoop via the VMware terminal. Click on your running VM and hit Alt-F5

Log in using root and hadoop as your password. You will then be asked to create a new password.

PigGrunt.jpg

Now go to http://127.0.0.1:8888 and click View Advanced Options

hortonworks8

Here you will see instructions for accessing Hadoop via SSH Client (command line access).

Also, in case someone out there doesn’t know. localhost and 127.0.0.1 mean the same thing and are interchangeable.

PigGrunt1.jpg

Putty

On of the best ways to access SSH from Windows is through a free program called Putty. You can find a link for it here:putty

Once you have it downloaded, click on the exe file and fill in the information as seen below. IP or Hostname(127.0.0.1 or localhost) and Port 2222. Make sure SSH is checked. If you want, you can save these setting like I did. I named mine HadoopLearning.

Next, click Open

PigGrunt2.jpg

Log in using root and your new password

PigGrunt3.jpg

Okay, now for a quick explanation of what will seem confusing to some. You are currently logged into a Linux computer (CentOS to be exact) that came prepacked with Hadoop already installed. So, in order to interact with Hadoop we need to use one of the two following commands first (hdfs dfs or hadoop fs) – either one works. It is just a matter of personal choice. I like hdfs dfs, but feel free to use hadoop fs

hdfs dfs -ls /

This command gives me the listing (ls) of the root hadoop folder.

PigGrunt4.jpg

I know my files are in user/maria_dev so let’s look in there now.

hdfs dfs -ls /user/maria_dev

You can see I have some csv files in this folder we can work with.

PigGrunt5.jpg

Now we are in a working terminal and have data, the next step is to start PIG. Now pay close attention, this part is very difficult. Go to your command terminal and type: pig

You will see the terminal working and you should end up with a prompt that says grunt>

PigGrunt6.jpg

grunt>

PigGrunt7.jpg

uP = LOAD '/user/maria_dev/UltrasoundPrice.csv' USING PigStorage(',') as (Model, Price);

dump uP;

You will see the MapReduce job running.

PigGrunt8.jpg

And here are your results.

PigGrunt9.jpg

 

2 thoughts on “PIG: Use GRUNT to Access PIG from the Command Line

  1. I really enjoyed reading this blog post, and now let me share a quick summary of what it’s about.

    The article explains how to use GRUNT to run PIG Scripts from the command line in Hadoop. It provides instructions for logging into Hadoop via the VMware terminal, accessing Hadoop via SSH client, and using Putty for SSH access from Windows. It also demonstrates commands for interacting with Hadoop, starting PIG, and running a PIG script to process data.

    https://www.kellytechno.com/Hyderabad/Course/Hadoop-Training

Leave a Reply