关于php抓取页面信息的简单代码
3077 点击·0 回帖
![]() | ![]() | |
![]() | 利用php DOM函数实现简单的单页信息抓取 (在这里尽抓取a标签,功能实现了,但是扩展页链接抓取没有实现,欢迎大家批评指导) <?php error_reporting(E_ERROR); $pages = file_get_contents('http://www.php100.com'); //$pages = htmlspecialchars($pages); $doc = new DOMDocument(); $new_doc = new DOMDocument('1.0', 'utf-8'); $doc->loadhtml($pages); $dom = $doc->getElementsByTagName('a'); for ($i=0;$i<$dom->length;$i++){ $node = $new_doc->createElement('a',$dom->item($i)->nodeValue); $newnode = $new_doc->appendChild($node); $newnode->setAttribute('href',$dom->item($i)->getAttribute('href')); $newnode->setAttribute('style','display:block;margin-left:30px;');//echo $dom->item($i)->getAttribute('src').'</br>'; } echo $new_doc->saveHTML(); ?> | |
![]() | ![]() |