How To Automatically Download Linked PhotoBucket Images

by  on 
05 Jul

IuoZiwq

PhotoBucket recently did a change to their terms of service, only allowing linking from third parties if you upgrade your account to their top tier “Plus500″ plan at ~$40 a month…

This has caused havoc and disarray across many forums and websites that have images in their content that is linked to PhotoBucket.

Since this also affected a community forum we help host, I decided to look into how we could sort this.

After doing a few Google searches, all I could find was threads on various forums where they tried to organize manual labor from their members to download affected images.

Doing this manually never crossed my mind, so looked into the network traffic as I accessed a image hosted at PhotoBucket looking for a loophole I could use.

Quickly found out that the way they generate the thumbnail could be used to automate the process.

Thumbnail URL Example:
http://rs51.pbsrc.com/albums/f382/Sunkensie/Tile1_crop_zpskxc7xtrn.jpg?w=280&h=210&fit=crop

A quick test confirmed that if the width and height was changed to a very high number, and fit changed to scale, it actually delivered the full size image.

To get the thumbnail url, all that is needed is a POST call to http://photobucket.com/galleryd/search.php, and looking at the content sent over with the call it was easy to spot that the media id they used was just base64 encoded. This made it very easy to recreate the call information using the image urls parsed from the forum posts.

 

It fast became clear that the code running on the webserver’s at PhotoBucket has its flaws, and even servers in their cluster does not work properly.

From time to time when accessing the image links, you will get this error. All you need to do to bypass it is just to refresh a few times.

Warning: include(/mnt/den2-pri1-mnt/code/code/pb-htdocs/public/galleryd/desktop.php): failed to open stream: No such file or directory in /code/pb-htdocs/public/galleryd/index.php on line 122

Warning: include(): Failed opening '/mnt/den2-pri1-mnt/code/code/pb-htdocs/public/galleryd/desktop.php' for inclusion (include_path='.:..:../..:/usr/local/lib/php:/usr/share/php/Smarty') in /code/pb-htdocs/public/galleryd/index.php on line 122

The same happen on calls made to the search.php page, where it will throw these notices before the json code when it dont locate the image. This usually means that the image has been deleted from the account.

Notice: Undefined index: mediaIndex in /code/pb-htdocs/public/galleryd/search.php on line 139

Notice: Undefined index: media in /code/pb-htdocs/public/galleryd/search.php on line 141

Warning: array_key_exists() expects parameter 2 to be array, null given in /code/pb-htdocs/public/galleryd/search.php on line 142

 

Since each forum software/scripts etc. has their own way to handle content and images, instead of posting the actual code I used, I have added the base parts you will need below. All that you will need to do is tailor it to fit what you want to parse from.

This also allow you to decide if you just want to host the images on your server and then change the image tags in the posts to the new location (easiest by far) or if you want to add them as attachments to the forum software you use before tying them to the posts.

I strongly recommend that you create a database table to store the parsed and downloaded images together with a reference to the tag, post, and location on the server. Also store the tags to images you cant download (marked as such), since this allow you to easily try to reprocess them again later, as I noticed several times that images that initially failed, I was able to download after a few hours.

It is also wise to setup the code so it process in batches and automatically continue as each batch is completed.

The first thing you need to do is select the posts that you want to parse for linked images. Store the details you need (at least, “post id”, “content of post”) into an array.

After you have found those you want to parse during this round, close the database connection. If this is really required, depends on how many images are attached to the posts you want to parse. The reason we do it, is so that the code won’t fail with a “database timeout” error on longer executions.

With the code examples below I have successfully downloaded around 16k images that was linked to PhotoBucket from the forum, and updated them to be hosted from the forum server instead.

If you have any questions, feel free to leave a comment below.

 

Here is a code example of a simple way to parse the post content:

$posts = array(); //Contain the information for those posts you want to parse

$download = new DownloadImages(); //Look below in the post for the class
$process_tags = array('img'); //Contain all image tags your forum software allow that you want to download images from

foreach ($posts as $post_id => $post) {

	foreach ($process_tags as $tag) {

		if (preg_match_all('/\['.$tag.'([^\]]*)\]([^\[]+)\[\/'.$tag.'([^\]]*)\]/i', $post['text'], $matches) > 0) {

			//Process all matched [] tags
			$cnt = 1;

			foreach ($matches[2] AS $key => $match_or) {
				$match = trim(html_entity_decode($match_or));
				$match_or = '['.$tag.']'.html_entity_decode($match_or).'[/'.$tag.']';

				if (//check if image already has been processed) {
					continue 1;
					}
					
				//Add your other code here to process located images
        }
      }
    }
  }

Please note that you need to implement the part where you check if the image has been already been processed/downloaded. And also of course setup the code for how you want to process any located images.

Note: If you only want to download PhotoBucket images, you should make certain you handle this in your code, else it will process all tags it locate.

 

Here is the code I have used to process and find the correct download url for the PhotoBucket images:

//If its photobucket run own rules on it
  if (strpos($match, 'photobucket.com') !== false) {
    $info = $download->createPhotoBucketSearchDefault($match);

    $response = $download->connectPhotoBucket($info['base_url'], $info['username'], $info['media_id']);

    //This is usually an error that indicate that it is for an old upload, using old system
    if (strpos($response, 'java.lang.StringIndexOutOfBoundsException') !== false) {
      $info = $download->createPhotoBucketSearchOld($match);

      $response = $download->connectPhotoBucket($info['base_url'], $info['username'], $info['media_id']);

      //Just a failsafe if we get same error again, so we can look into why in that case
      if (strpos($response, 'java.lang.StringIndexOutOfBoundsException') !== false) {
        var_dump($response);
        echo $match;
        exit;
        }
      }

    //The image is no longer on their server
    if (strpos($response, 'com.photobucket.core.dao.exception.ObjectNotFoundException') !== false) {
      continue 1;
      }
    elseif (strpos($response, 'Unable to locate AlbumPreference identified by:') !== false) {
      continue 1;
      }

    $reply = json_decode($response, true);

    if (empty($reply['mediaDocuments']['thumbnailUrl']) && !empty($reply['mediaDocuments']['media'])) {

      foreach ($reply['mediaDocuments']['media'] as $media) {
        if (strpos($media['thumbnailUrl'], $page) !== false) {
          $thumb_url = $media['thumbnailUrl'];

          break 1;
          }
        }

      if (empty($thumb_url)) {
        $page2 = str_replace(' ', '%20', $page);

        foreach ($reply['mediaDocuments']['media'] as $media) {
          if (strpos($media['thumbnailUrl'], $page2) !== false) {
            $thumb_url = $media['thumbnailUrl'];

            break 1;
            }
          }
        }
      }
    elseif (!empty($reply['mediaDocuments']['thumbnailUrl'])) {
      $thumb_url = $reply['mediaDocuments']['thumbnailUrl'];
      }
    else {
      //Failsafe
      var_dump($response);
      echo $match;
      exit;
      }

    //Failsafe
    if (empty($thumb_url)) {
      echo $match.'<br><pre>';
      var_dump($reply);exit;
      }

    $match = substr($thumb_url, 0, strpos($thumb_url, '?')).'?w=10000&h=10000&fit=scale';
    }

The code handle all cases I got into when parsing images from the forum, I recommend leaving the crude and simple “failsafe” in place, just in case you hit any odd cases.

 

The code used to actual download the image to the server is below:

//Download the image
  $real_name = $post['post_id'].'_'.($down_id);
  $handle = fopen($image_dir.$real_name, 'wb');

  $response = $download->downloadImage($handle, $match);

  fclose($handle);

  //if the file downloaded is not an image (some save 404 html pages if the link is not present any more)
  if ($response !== false && (!file_exists($image_dir.$real_name) || getimagesize($image_dir.$real_name) === false)) {
    $response = false;
    }

It is straight forward, you setup the path to save the image, and try to download it. Afterwards it verify if the image was actually downloaded.

 

Here is the class used to process and download the images.

class DownloadImages {

  /**
   * Create PhotoBucket Search Terms: Default
   *
   * @param $url
   * @return array
   */
  public function createPhotoBucketSearchDefault($url) {
    $temp = explode('/', urldecode($url));

    $total = count($temp);
    $encoded = '';

    for ($num=6;$num < $total;++$num) {
      if (!empty($encoded)) {
        $encoded .= '/';
        }

      $encoded .= $temp[$num];
      }

    $encoded = base64_encode('path:'.$encoded);

    return array('base_url' => 'http://photobucket.com/gallery/user/'.$temp[5].'/media/'.$encoded.'/?ref=',
                 'media_id' => $encoded,
                 'username' => $temp[5]);
    }


  /**
   * Create PhotoBucket Search Terms: Old
   *
   * @param $url
   * @return array
   */
  public function createPhotoBucketSearchOld($url) {
    $temp = explode('/', urldecode($url));
    $encoded = base64_encode('path:/'.end($temp));

    return array('base_url' => 'http://photobucket.com/gallery/user/'.$temp[5].'/media/'.$encoded.'/?ref=',
                 'media_id' => $encoded,
                 'username' => $temp[5]);
    }


  /**
   * PhotoBucket: Get Actual Url Details
   *
   * @param $baseUrl
   * @param $username
   * @param $mediaId
   * @return mixed
   */
  public function connectPhotoBucket($baseUrl, $username, $mediaId) {
    $handle = curl_init('http://photobucket.com/galleryd/search.php');
    curl_setopt($handle, CURLOPT_HTTPHEADER, array('Content-Type' => 'application/x-www-form-urlencoded; charset=UTF-8',
                                               'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.90 Safari/537.36 Vivaldi/1.91.867.38',
                                               'Origin' => 'http://photobucket.com',
                                               'Referer' => $baseUrl));
    curl_setopt($handle, CURLOPT_POST, true);
    curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($handle, CURLOPT_POSTFIELDS, array('userName' => $username
                                               , 'searchTerm' => ''
                                               , 'mediaId' => $mediaId));
    curl_setopt($handle, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($handle, CURLOPT_CONNECTTIMEOUT, 5);

    $response = curl_exec($handle);

    curl_close($handle);

    return $response;
    }


  /**
   * Download Image
   *
   * @param $fileHandle
   * @param $url
   * @return mixed
   */
  public function downloadImage($fileHandle, $url) {
    $handle = curl_init($url);

    curl_setopt($handle, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($handle, CURLOPT_FILE, $fileHandle);
    curl_setopt($handle, CURLOPT_HEADER, 0);
    curl_setopt($handle, CURLOPT_CONNECTTIMEOUT, 5);
    curl_setopt($handle, CURLOPT_TIMEOUT, 20);
    curl_setopt($handle, CURLOPT_SSL_VERIFYHOST, false);

    $response = curl_exec($handle);

    curl_close($handle);

    return $response;
    }
  }
About the author Formally educated as an electronics engineer, Sven moved on to web development in 2004 after having it as a hobby for almost a decade. Over the last few years he has accumulated a vast knowledge and experience in the field of complicated web-based applications working with everything from transaction based systems to high traffic websites.

He is passionate about clean, efficient and secure code. When working on a project he will not budge until every security aspect has been taken care of.