Scraping Twitpics with PHP

Update! My twitpic scraper (as well as search API calls) have been integrated into NCSU’s Tweetgator. Check it out on Github!

A couple months ago, IU East re-vamped its twitter wall. We incorporated a codebase originally developed by NCSU, and then I extended it by adding inline hashtag searching and a twitpic scraper.

At the time I wrote it, I could not find any other existing Twitpic scraper – Twitpic doesn’t have a formal API (or at least, it didn’t then; I don’t think it does at the time of this writing, either).

Effectively what this script does (see after the jump) is to browse the Twitpic site, parse out the image IDs, and then re-create the Twitpic images. It is somewhat rudimentary in that it does not cache nor does is it actually download the images — anyone reading this may feel free to extend the code into something like that.

The script below is licensed under the GPL2, with all the requirements, freedoms, and obligations therein.

<?php

define('USERNAME', 'yourtwitterusername');

// How many pics to display by default
$quantity = (isset($_GET['qty']) && !empty($_GET['qty'])) ? $_GET['qty'] : "8";
// The rendering format
$format = (isset($_GET['format']) && !empty($_GET['format'])) ? $_GET['format'] : "json";

$url = "http://www.twitpic.com/photos/$user";

define('MINI_URL', 'http://twitpic.com/show/mini/%1$s');
define('THUMB_URL',  'http://twitpic.com/show/thumb/%1$s');
define('LARGE_URL',  'http://twitpic.com/show/large/%1$s');
define('PIC_URL',  'http://twitpic.com/%1$s');

// The formatting string, used for LI based outputs
$liFormat = '<li>
<a href="' . $picUrl . '" title="Twitpic">
<img src="' . $thumbnailUrl . '" alt="twitpic" />
</a>
</li>';

///// CURL down the Twitpic data /////////////////////////////
// See below for discussion on why we're not RegExing for images directly
$searchForPhotos = '<a href="/(w+)">';
$ch = curl_init($url);
$photoIDs = array();
$photos = array();

//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$temp = curl_exec($ch);
curl_close($ch);

preg_match_all($searchForPhotos, $temp, $photoIDs);
array_shift($photoIDs);
$photoIDs = array_slice($photoIDs[0], 0, $quantity);

$output = "";

switch ($format) {

case "LI":
foreach($photoIDs as $id)
{
$output .= sprintf($liFormat, $id);
}
break;

default:
///// Parse out the raw data into a usable format (JSON) //////
foreach($photoIDs as $id)
{
$photos[$id]["mini"] = sprintf(MINI_URL, $id);
$photos[$id]["thumb"] = sprintf(THUMB_URL, $id);
$photos[$id]["full"] = sprintf(LARGE_URL, $id);
$photos[$id]["url"] = sprintf(PIC_URL, $id);
}
$output = json_encode($photos);
break;
}
echo $output;

?>

The main challenge I encountered while doing this initially was that Twitpic obfuscates the way images are created — it seems counter-intuitive to search for link tags instead of IMG tags, but if you try hotlinking directly to the IMG src, you’ll find that it doesn’t work (presumably being blocked via .htaccess or something similar).

In the interest of fairness — it would probably be ideal to actually download and cache the images, rather than hotlinking — and I would advise anyone that implements this script to do that.

Once the page scrape has been parsed for link targets, we take those matches and then build new URLs. The actual photos, visible offsite, are different URLs than those that are shown on Twitpic’s site itself, which outsources to Amazon’s cloud web services. The Twitpic /show/ URLs are about as close to an API / web service that it comes, so that’s what we have to work with!

The IU East twitter wall uses a quick Ajax call to populate the Twitpic photos, to increase page load speed.

Any takers that want to make the changes I mention above (or similar improvements) feel free to post them in the comments below (if the form will let you) or email them to me — I have a gmail account and my username is “armahillo”. I will happily modify the code above and credit you for your submission.

Scraping Twitpics with PHP

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List