'Apache Beam FileIO match - What's better/more efficient way to match files? [closed]
I'm just wondering - does the use of wildcard have an impact on how Beam matches files? For instance, if I want to match a file with Apache Beam, is there an advantage if I'd specify a direct path to a file (i.e. folder/subfolder/file.txt). Or, if I'd give just a wildcard to match() method as an input, would it be as efficient or worse, in terms of frameworks's performance?
Thanks
Solution 1:[1]
Compared to the cost of reading the file (and spinning up workers, if running on a distributed runner), the cost of matching will be negligible. On the other hand, multiple reads (with distinct direct paths) will generally be more overhead than reading a wildcard match.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | robertwb |