#眉標=雲端運算 #副標=開放原始碼的雲端運算平台技術(9) #大標=Pig – 在Hadoop肩上跳舞的小豬 #作者=文/圖 沈炳宏 ============= 程式1 Users = load ‘users’ as (name, age); Fltrd = filter Users by  age >= 18 and age <= 25; Pages = load ‘pages’ as (user, url); Jnd = join Fltrd by name, Pages by user; Grpd = group Jnd by url; Smmd = foreach Grpd generate group,  COUNT(Jnd) as clicks; Srtd = order Smmd by clicks desc; Top5 = limit Srtd 5; store Top5 into ‘top5sites’; ================ ============= 程式2 grunt> A = load 'passwd' using PigStorage(':'); grunt> B = foreach A generate $0 as id; grunt> dump B; ================ ============= 程式3 log = LOAD ‘excite-small.log’ AS (user, timestamp, query); grpd = GROUP log BY user; cntd = FOREACH grpd GENERATE group, COUNT(log); STORE cntd INTO ‘output’; 結果: bill 18 gates 18 ================ ============= 程式4 LogFormat “%h %l %u %t \\」%r\\” %>s %b” common CustomLog logs/access_log common ================ ============= 程式5 LogFormat "%h %l %u %t \"%r\" %>s %b \" %{Referer}i\" \"%{User-Agent}i\"" combined ================ ============= 程式6 register piggybank.jar; DEFINE LogLoader org.apache.pig.piggybank. storage.apachelog.CombinedLogLoader(); DEFINE DayExtractor org.apache.pig.piggybank. evaluation.util.apachelogparser. DateExtractor('yyyy-MM-dd'); ================