What is Robots.txt & Guide to Generate It
Is video me hum apko batayenge ki robots.txt file kya hota he, kyn important hota he, robots.txt file kaise banayi jati he sath hi 4 imporatant facts robots.txt ke bare me.
What Is Robots.txt?
A robots.txt file is like a guide-man who suggest where you should go or not go. Same a robots.txt file tells search engine crawlers which URLs the crawler can access on your site. It tells search engine spiders to not crawl certain pages or sections of a website. Most major search engines (including Google& Bing) recognize requests.
Why is Robots.txt Important?
There are 3 main reasons to use robots.txt
1. Block Non-Pubic Pages – If you website has unwanted pages, private page, login page then you need to use robots.txt.
2. Maximize Crawl Budget – Crawl budget is the number of pages Bot crawls / indexed within a time-frame. So, if your all pages are not being indexed, use robots.txt to block non important page.
3. Prevent Indexing of Resources – Using meta title you can prevent pages from getting indexed, however it does not work always like if you have pdf, images that you don’t want to be indexed then use robots.txt
How to Create a Robots.txt File
Minimum required text in a robots.txt file is :
1. User-agent – Any person or program active on the Internet will have a “user agent,” or an assigned name. For human like – they use browser and operating system. But search engines uses bots known as crowler or indexer.
2. Allow / disallow – The Disallow command is the most common in the robots exclusion protocol. It tells bots not to access the web-page that come after the command. However, allow command tells bots to access the webpage that come after the command.
3. Sitemap (it is optional but helpful if you use it)
Some example of user agent:
Google: Googlebot, Googlebot-Image (for images), Googlebot-News (for news), Googlebot-Video (for video)
Bing: Bingbot, MSNBot-Media (for images and video), Baidu
How to Use Robots.txt File
Step 1 : Create a robots.txt file
Examples of Robots.txt file
Example : 1 (allow)
User-agent: *
Allow: /
Example : 2 (disallow)
User-agent: Googlebot
Allow: /
Disallow: /private-page
Disallow: /login-page
Example : 3 (multi agent)
User-agent: Googlebot
User-agent: AdsBot-Google
Disallow: /
Example : 4 (with site map)
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Step 2 : Upload it to the root of the website. Basically under public_html folder.
Step 3 : verify it by opening url yoursite.com/robots.txt, if it working then your robots.txt file is uploaded successfully. Simply add robots.txt in the last of the url, like websiteurl.com/robots.txt to check any site’s robots.txt file.
Robots.txt Important Points :
- The file must named robots.txt
- Your site can have only one robots.txt file
- The robots.txt file must be located at the root of the website. You can check by opening url yoursite.com/robots.txt.
- Robots.xt file may not be support by all search engines.