'Is the File-Object iterator "broken?"
According to the documentation:
Once an iterator’s
__next__()
method raisesStopIteration
, it must continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken.
However, for file-objects:
>>> f = open('test.txt')
>>> list(f)
['a\n', 'b\n', 'c\n', '\n']
>>> next(f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> f.seek(0)
0
>>> next(f)
'a\n'
Are file-object iterators broken? Is this just one of those things that can't be fixed because it would break too much existing code that relies one it?
Solution 1:[1]
I think this is, if anything, a docs bug on that paragraph, not a bug in io
objects. (And io
object’s aren’t the only thing—most trivially, a csv.reader
wrapper around a file is just as restartable as a file.)
If you just use an iterator as an iterator, once it raises it will keep on raising. But if you call methods outside of the iterator protocol, you’re not really using it as an iterator anymore, but as something more than an iterator. And in that case, it seems legal and even idiomatic for the object to be “refillable” if it makes sense. As long as it never refills itself while it’s quacking as an iterator, only when it’s quacking as some other type that goes beyond that.
In a similar situation in C++, the language committee might well declare that this breaks substitutability and therefore the iterator becomes invalid as an iterator once you call such a method on it, even if the language can’t enforce that. Or come up with a whole new protocol for refillable iterators. (Of course C++ iterators aren’t quite the same thing as Python iterators, but hopefully you get what I mean.)
But in Python, practicality beats purity. I’m pretty sure Guido intended this behavior from the start, and that an object is allowed to do this and still be considered an iterator, and the core devs continue to intend it, and it’s just that nobody has thought about how to write something sufficiently rigorous to explain it accurately because nobody has asked.
If you ask by filing a docs bug, I’ll bet that this paragraph gets a footnote, rather than the io
and other refillable iterator objects being reclassified as not actually iterators.
Solution 2:[2]
Yes, file iterators are "deemed broken" according to the section of the stdtypes documentation quoted in the question. Both the Python 3 iterator TextIOWrapper
and the Python 2 iterator file
are broken.
This is something worth keeping in mind if you're using code which assumes iterators are strictly adhering to the iterator protocol. To give one example, using the Python implementation of itertools.dropwhile
in combination with a file iterator is buggy. You might encounter issues by iterating a log file whilst another process is still appending lines to the log file.
There was a discussion about this question in the mailing lists. Search the September 2008 archives for Why are "broken iterators" broken? A couple of quotes:
Strictly speaking, file objects are broken iterators.
It's a design guideline, not an absolute rule.
And Terry Reedy:
It is quite possible that a stream reader will return '' on one call and then something non-empty the next. An iterator that reads a stream and yields chunks of whatever size should either block until it gets sufficient data or yield nulls as long as the stream is open and not raise StopIteration until the steam is closed and it has yielded the last chunk of data.
There is an important different between a store that is closed until the next day and one that closed - out of business. Similarly, there is a difference between an item being out-of-stock until the next delivery and out-of-stock and discontinued permanently, or between a road closed for repairs versus removal for something else. Using the same sign or signal for temporary and permanent conditions is confusing and therefore 'broken'.
I think this behavior is unlikely to change in the language ("Practicality beats purity"), but perhaps the language in the docs will be softened up. There is an existing open issue about that, if you want to follow it: issue23455
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | abarnert |
Solution 2 |