
 Originally Posted by 
lestat1116
					 
				 
				naa kay setup ani?
			
		 
	 
 You can just use a normal Unix or Linux box.  Here's a sample on how to spider the URLs from PNOY's website:
	Code:
	$ wget -r -l 0 http://president.gov.ph/
$ for i in `egrep -Ihsoiwr '(http[s]*[:][/]+|www[.])[^"\<>]*' * |  egrep -v '^$|^#'^C`; do
       echo "INSERT INTO lestat('url') VALUES('$i');"
  done
 Sample output:
	Code:
	INSERT INTO lestat('url') VALUES('http://www.twinhelix.com^M');
INSERT INTO lestat('url') VALUES('http://creativecommons.org/licenses/LGPL/2.1/^M');
INSERT INTO lestat('url') VALUES('http://www.president.gov.ph/images/seal.png');
INSERT INTO lestat('url') VALUES('http://www.facebook.com/pages/Noynoy-Aquino-P-Noy/141976959168393?ref=mf?');
INSERT INTO lestat('url') VALUES('http://twitter.com/#!/PresidentNoy');
INSERT INTO lestat('url') VALUES('http://www.youtube.com/user/RTVMPNoy');
INSERT INTO lestat('url') VALUES('http://api.recaptcha.net/challenge?k=6Lc_TbwSAAAAAFnJFB4XffLPFsftPkyexkr143PJ');
INSERT INTO lestat('url') VALUES('http://api.recaptcha.net/noscript?k=6Lc_TbwSAAAAAFnJFB4XffLPFsftPkyexkr143PJ');
INSERT INTO lestat('url') VALUES('http://www.president.gov.ph');
INSERT INTO lestat('url') VALUES('WWW.PRESIDENT.GOV.PH');
INSERT INTO lestat('url') VALUES('http://gov.ph');
INSERT INTO lestat('url') VALUES('http://www.idocs.com^M');
 NOTE: Some of the output are invalid.  You still need to fix the regex or add a URL validation to make it more robust. 
 
[ simon.cpu ]