'Contradictory rules in robots.txt

I'm attempting to scrape a website and these two rules seem to be contradictory in robots.txt

User-agent: *
Disallow: *
Allow: /

Does Allow: / mean that I can scrape the entire website, or just the root? As if means I can scrape the entire site then this is directly contradictory to the previous rule.

web-scraping robots.txt

Solution 1:^[1]

If you are following the original robots.txt standard:

The * in the disallow line would be treated as a literal rather than a wildcard. That line would disallow URL paths that start with an asterisk. All URL paths start with a /, so that rule disallows nothing.
The Allow Rule isn't in the specification, so that line would be ignored.
Anything that isn't specifically disallowed is allowed to be crawled.

Verdict: You can crawl the site.

Google and a few other crawlers support wildcards and allows. If you are following Google's extensions to robots.txt, here is how Google would interpret this robots.txt:

Both Allow: / and Disallow: * match any specific path on the site.
In the case of such a conflict, the more specific rule (ie longer) rule wins. / and * are each one character, so neither is considered more specific than the other.
In a case of a tie for specificity, the least restrictive rule wins. Allow is considered less restrictive than Disallow.

Verdict: You can crawl the site.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Contradictory rules in robots.txt

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]