August 29, 2008 by Alex Polski

How to develop a good scraper on Perl - Lesson 1

If you want to develop a good scraper Perl can be very good solution for you. It has all you need for these purposes: mechanize library, treebuilder class and threads support.

1. Mechanize library is a complex library for automating interaction with websites. It completely simulates user’s activity like clicking on links and submitting forms and has a lot of another useful features. Let’s look at the code:

use WWW::Mechanize;

#create mechanize object
my $mech = WWW::Mechanize->new();

#set user agent string
$mech->agent('Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1');

#go to http://www.example.com/
$mech->get('http://www.example.com/');

#click on link 'Some text'
$mech->follow_link(text => 'Some text');

#fill and submit the form
$mech->submit_form(
  form_name => 'search',
  fields => { query => 'Some text' },
  button => 'Search Now'
);

Of course, there were basic features used in the example above, you can find full documentation here.

The books I recommend:

Share and Enjoy:
  • del.icio.us
  • Digg
  • Reddit
  • Ma.gnolia
  • Technorati
  • Propeller
  • Facebook
  • StumbleUpon
  • Furl
  • blogmarks
  • Google
  • YahooMyWeb
  • E-mail this story to a friend!
This entry was posted on Friday, August 29, 2008 at 6:48 am and is filed under Scraping. You can leave a response, or trackback from your own site.

Related posts

« Backing up MySQL tables under extremal conditions

How to develop a good scraper on Perl - Lesson 2 »



One Response to “How to develop a good scraper on Perl - Lesson 1”

alpha
Posted on September 24th, 2008 at 5:08 pm

Take a good look at FEAR::API

you can get it here http://backpan.perl.org/authors/id/X/XE/XERN/

Leave a Reply