Lin Hsin Hsin Artificial Intelligence Center
Spiders, Scrapers, Crawlers & Bots
What is a WEB CRAWLER?
Definition
A web crawler is a digital search engine BOT that uses:
🛠 copy & metadata
to discover & index web pages
Types of Web Crawlers by Functionality:
🛠 Focused web crawler
🛠 Incremental web crawlers
🛠 Distributed crawlers
🛠 Parallel crawlers
🛠 Deep web crawlers
🛠 Screen scrapers
Web Crawlers by Classifications:
Used by the search engine to:
🎯 Crawl websites
🎯 View images
🎯 View links
🎯 Index them on the internet
Commercial Bots
Used by some SEO websites to provide users with SEO reports of a selected website
so as to solve any SEO issues on the site
Examples:
🕸️ Ahrefsbot by ahref.com
🕸️ SemrushBot by Semrush.com
🕸️ Barkrowler by Babbar.tech
Feed Fetchers Bots
Used to collect thumbnails & titles of the contents to display on their website
Examples:
Facebook external hit – used by the Facebook website
Twitter bot – used by Twitter
Monitoring Bots
Used to check the performance of the websites performances
🎯 uptime
🎯 pinback
What is a SPIDER?
Definition
A web spider is similar to a crawler but it is more focused on indexing the textual content of a web page
Itnis deployed by search engines to scan & index the web
What is a SCRAPER?
Definition
📍 A web scraper is a program or script that EXTRACTS specific data from websites
📍 Unlike crawlers, which collect information about websites
scrapers are focused on the CONTENTS of the site, EXTRACTING:
📝 texts
🖼 images
📹 videos
🗣 audio
💲 prices
🎯 any other specific elements