'Javascript regex for matching/extracting file extension

The following regex

var patt1=/[0-9a-z]+$/i;

extracts the file extension of strings such as

filename-jpg
filename#gif
filename.png

How to modify this regular expression to only return an extension when string really is a filename with one dot as separator ? (Obviously filename#gif is not a regular filename)

UPDATE Based on tvanofsson's comments I would like to clarify that when the JS function receives the string, the string will already contain a filename without spaces without the dots and other special characters (it will actually be handled a slug). The problem was not in parsing filenames but in incorrectly parsing slugs - the function was returning an extension of "jpg" when it was given "filename-jpg" when it should really return null or empty string and it is this behaviour that needed to be corrected.



Solution 1:[1]

Just add a . to the regex

var patt1=/\.[0-9a-z]+$/i;

Because the dot is a special character in regex you need to escape it to match it literally: \..

Your pattern will now match any string that ends with a dot followed by at least one character from [0-9a-z].

Example:

[
  "foobar.a",
  "foobar.txt",
  "foobar.foobar1234"
].forEach( t => 
  console.log(
    t.match(/\.[0-9a-z]+$/i)[0]
  ) 
)

if you want to limit the extension to a certain amount of characters also, than you need to replace the +

var patt1=/\.[0-9a-z]{1,5}$/i;

would allow at least 1 and at most 5 characters after the dot.

Solution 2:[2]

Try

var patt1 = /\.([0-9a-z]+)(?:[\?#]|$)/i;

This RegExp is useful for extracting file extensions from URLs - even ones that have ?foo=1 query strings and #hash endings.

It will also provide you with the extension as $1.

var m1 = ("filename-jpg").match(patt1);
alert(m1);  // null

var m2 = ("filename#gif").match(patt1);
alert(m2);  // null

var m3 = ("filename.png").match(patt1);
alert(m3);  // [".png", "png"]

var m4 = ("filename.txt?foo=1").match(patt1);
alert(m4);  // [".txt?", "txt"]

var m5 = ("filename.html#hash").match(patt1);
alert(m5);  // [".html#", "html"]

P.S. +1 for @stema who offers pretty good advice on some of the RegExp syntax basics involved.

Solution 3:[3]

Example list:

var fileExtensionPattern = /\.([0-9a-z]+)(?=[?#])|(\.)(?:[\w]+)$/gmi
//regex flags -- Global, Multiline, Insensitive

var ma1 = 'css/global.css?v=1.2'.match(fileExtensionPattern)[0];
console.log(ma1);
// returns .css

var ma2 = 'index.html?a=param'.match(fileExtensionPattern)[0];
console.log(ma2);
// returns .html

var ma3 = 'default.aspx?'.match(fileExtensionPattern)[0];
console.log(ma3);
// returns .aspx

var ma4 = 'pages.jsp#firstTab'.match(fileExtensionPattern)[0];
console.log(ma4);
// returns .jsp

var ma5 = 'jquery.min.js'.match(fileExtensionPattern)[0];
console.log(ma5);
// returns .js

var ma6 = 'file.123'.match(fileExtensionPattern)[0];
console.log(ma6);
// returns .123

Test page.

Solution 4:[4]

ONELINER:

let ext = (filename.match(/\.([^.]*?)(?=\?|#|$)/) || [])[1] 

above solution include links. It takes everything between last dot and first "?" or "#" char or string end. To ignore "?" and "#" characters use /\.([^.]*)$/. To ignore only "#" use /\.([^.]*?)(?=\?|$)/. Examples

function getExtension(filename) {
  return (filename.match(/\.([^.]*?)(?=\?|#|$)/) || [])[1];
}


// TEST
[
  "abcd.Ef1",
  "abcd.efg",
  "abcd.efg?aaa&a?a=b#cb",
  "abcd.efg#aaa__aa?bb",
  "abcd",
  "abcdefg?aaa&aa=bb",
  "abcdefg#aaa__bb",
].forEach(t=> console.log(`${t.padEnd(21,' ')} -> ${getExtension(t)}`))

Solution 5:[5]

I found this solution on the O'Reilly Regular Expressions Cookbook (chapter 8, section 24). It is case-insensitive and works with .NET, Java, JavaScript, PCRE, Perl, Python & Ruby.

\.[^.\\/:*?"<>|\r\n]+$

A file extension must begin with a dot. Thus, we add ‹.› to match a literal dot at the start of the regex.

Filenames such as Version 2.0.txt may contain multiple dots. The last dot is the one that delimits the extension from the filename. The extension itself should not contain any dots. We specify this in the regex by putting a dot inside the character class. The dot is simply a literal character inside character classes, so we don’t need to escape it. The ‹$› anchor at the end of the regex makes sure we match .txt instead of .0.

If the string ends with a backslash, or with a filename that doesn’t include any dots, the regex won’t match at all. When it does match, it will match the extension, including the dot that delimits the extension and ...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community
Solution 2 Community
Solution 3 Yassin Mokni
Solution 4
Solution 5 James Moberg