Friday, 16 August 2013

How to scrap the titles of a Wordpress site using CURL php?

$curl1 = curl_init();
curl_setopt($curl1, CURLOPT_URL, 'Your URL');
curl_setopt($curl1, CURLOPT_HEADER, false); 
curl_setopt($curl1, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($curl1, CURLOPT_FOLLOWLOCATION,true);
curl_setopt($curl1, CURLOPT_ENCODING,"");
curl_setopt($curl1, CURLOPT_USERAGENT, "spider");
curl_setopt($curl1, CURLOPT_AUTOREFERER,true);
$result1 = curl_exec($curl1); curl_close($curl1); 

## get the title
preg_match_all('your h1 tag with calss', $result1, $matches1, PREG_OFFSET_CAPTURE);
preg_match('/your anchor tag/', $matches1[0][0][0], $set1);

1) Initialize the CURL
2) Enter the URL
3) Check whether the wordpress site comes with the h1 tag and with the same class i declared here, else use the correct one
4) I dont want the link tag so i removed the <--a---> tag, and finally we will have the result

No comments:

Post a comment