PHP Google Blogsearch URL Scraper

Posted by DaPimp on November 28, 2008 – 12:56 pm

Sometimes you just need a crapload of URL’s from Wordpress blogs. It’s nobodies business why you need them, if you need them, you need them.

Enter DaPimp’s Google Blogsearch URL scraper.

In a nutshell, this script grabs the fist 1,000 results of Wordpress blogs for a given keyword, and spits them out in a nice list for you.

**You’ll need PHP5, and a server with cURL enabled for this script to work**

Instructions for use:

  1. You can either download the script here (change the file extension to .php), or just copy and paste the code below (wordpress buggers up the quote marks in code, so you’ll probably need to go and replace them manually -just download the script, it’s much easier)
  2. Open the script in a text editor, and change the $keyword variable at the top to the keyword you want to search for
  3. Save the script and upload it to your server
  4. Navigate to the script in your browser, and wait, you’ll get your list

============ Start PHP Script ================

<?php

//give the script a keyword to search for
$keyword = “ipod touch”;
$keyword = str_replace(” “, “+”, $keyword);

//start a counter so we can number our results
$num = 0;

//set a start for our paging of Google Blogsearch (we’re going to be getting 10 pages X 100 results)
$start = 0;

do {

//Create the feed URL we’re going to get from Google Blogsearch
$feed = ‘http://blogsearch.google.com/blogsearch_feeds?hl=en&q=%22′ .$keyword. ‘+%22powered+by+wordpress%22&ie=utf-8&num=100&start=’ .$start. ‘&output=rss’;

//We’re using cURL to actually go fetch the page from Google Blogsearch
$ch = curl_init($feed);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $feed);
$page = curl_exec($ch);
curl_close($ch);

//Loop through the feed, and suck out the URL’s
$xml = new SimpleXMLElement($page);

foreach ($xml->channel->item as $item) {

//Add 1 to our counter, so our list has numbers next to the URL’s
$num = $num + 1;

$link = $item->link;

//Print our shit to the page
echo $num. ‘ - <a href=”‘ .$link. ‘”>’ .$link. ‘</a><br>’;

}

//Have a rest so we don’t get banned for hitting Google too hard and fast
sleep(30);

//Add 100 to the start, so we can fetch the next 100 results
$start = $start + 100;

}

//Keep doing this shit until we get to page 10 of the Google results
while ($start < 1000);

?>

============ End PHP Script ================

Tags: , , ,
This post is under “php” and has 15 sexy comments so far.
If you enjoy this article, make sure you subscribe to my RSS Feed.

15 Sexy Comments so far- Get into it»

  1. 1. Edmonton SEO Said:

    When I read this post I was wondering why we would want to do this, but then I loaded the script onto my server, played around a bit, and aha! This is super awesome. Thanks!!
    I might rework it a bit to create a nice little automated rank checker for my projects. Woo!

  2. 2. Dennis Edell Said:

    Perhaps an example or two of why we would need such URL’s?

  3. 3. Christopher Kata Said:

    Great Script! I’ve been testing it out this morning and have found a few good uses for it!

  4. 4. DaPimp Said:

    @Edmonton SEO - I’ve got a rank checker around here somewhere in the clutter that is my hard drive, I’ll dig it up and post it.

    @ Chris - It works especially well when it has a function to check pagerank of each of the pages, then check whether the blog is no-following its comments ;-)

  5. 5. Your name Said:

    I might rework it a bit to create a nice little automated rank checker for my projects. Woo!

  6. 6. DaPimp Said:

    @ Edmonton SEO & Your Name - it’s pulling results from Google BLOGSEARCH, not the normal google SERP’s

  7. 7. Nelson McCrady Said:

    Thanks for this tool man! I just loaded it up on my server and it is awesome. I had to use Excel to split the numbers away from the URL’s to load them up onto my ad network, but no big deal. I will be looking forward to your next script release! (Oh and I subscribed)

  8. 8. DaPimp Said:

    @Nelson,

    Try adding the following:

    $myFile = “scrape.txt”;
    $fh = fopen($myFile, ‘a’) or die(”can’t open file”);
    $data = $link. “\r\n”;
    fwrite($fh, $data);

    under this line:

    echo $num. ‘ - ’ .$link. ‘
    ’;

    Run the script, and you’ll find a text file called “scrape.txt” in the same folder as the script, with the URLs each on a separate line.

    Hope this helps :-)

    Cheers

    Stu

  9. 9. Digital Underground Said:

    Nice script, makes finding other blogs like my blogs quick.

  10. 10. Jackie Said:

    great tool very handy!

  11. 11. Jesica Amiden Said:

    Nice script buddy. but is it even possible to fetch normal organic google serp result.

  12. 12. DaPimp Said:

    @Jessica, yep, keep an eye out for the next post - some time in the next day or so.

    Cheers

    Stu

  13. 13. Web Design Sussex Said:

    This is perfect for an upcoming project of mine! Thanks

Post a reply