'How can I extract ISO and ASTM standards from a text using regex?

I would like to extract the ISO and ASTM standards from a text. The corresponding literals ISO and ASTM followed by the numbers would have to be found.

Rules:

  • Match starts with ISO or ASTM
  • ASTM is followed by a D
  • This is followed by a number (either preceded or not with a space or hyphen) that can also contain optional spaces and hyphens
  • As soon as the number sequence ends, the match ends

Possible pattern for the first two rules:

(?:ISO|ASTM\s*D)

Example:

ISO 527-1, DIN EN ISO 3349-3, and ASTM D143 are all testing standards. ISO 31 33, ISO 334 9 are specific to static bending, but ASTM D 149-3 includes various other 9.

https://regex101.com/r/IFlqT2/1

What would a corresponding regex look like?



Solution 1:[1]

You can use

(?:ISO|ASTM\s*D)(?:[\s-]*\d)+

Details:

  • (?:ISO|ASTM\s*D) - ISO or ASTM + zero or more whitespaces + D
  • (?:[\s-]*\d)+ - one or more repetitions of
    • [\s-]* - zero or more whitespaces or hyphens
    • \d - a digit.

See the regex demo.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wiktor Stribiżew