Monday, September 29, 2014

Google Scraper

I am in need for analyzing google search result, fortunately there are multiple opensource solution out there. But google hates scrapers and would block your IP should they determine that you are breaking their terms and condition. 

Possible Google Scraper: (Play with the sleep timing between request to prevent IP blocking)
https://github.com/NikolaiT/GoogleScraper
https://github.com/MarioVilas/google


//Example using MarioVilas's google scraper: 
python google.py --stop=20 "inurl:console filetype:php" > test.txt

//If you need to remove parameters, a simple bash script is perfect: 
vi removeparameter.sh
#!/bin/bash
while read p; do
FILE=$p
echo ${FILE%%\?*}
done < test.txt

No comments:

Post a Comment