'Javascript regex for matching/extracting file extension
The following regex
var patt1=/[0-9a-z]+$/i;
extracts the file extension of strings such as
filename-jpg
filename#gif
filename.png
How to modify this regular expression to only return an extension when string really is a filename with one dot as separator ? (Obviously filename#gif is not a regular filename)
UPDATE Based on tvanofsson's comments I would like to clarify that when the JS function receives the string, the string will already contain a filename without spaces without the dots and other special characters (it will actually be handled a slug
). The problem was not in parsing filenames but in incorrectly parsing slugs - the function was returning an extension of "jpg" when it was given "filename-jpg" when it should really return null
or empty string and it is this behaviour that needed to be corrected.
Solution 1:[1]
Just add a .
to the regex
var patt1=/\.[0-9a-z]+$/i;
Because the dot is a special character in regex you need to escape it to match it literally: \.
.
Your pattern will now match any string that ends with a dot followed by at least one character from [0-9a-z]
.
Example:
[
"foobar.a",
"foobar.txt",
"foobar.foobar1234"
].forEach( t =>
console.log(
t.match(/\.[0-9a-z]+$/i)[0]
)
)
if you want to limit the extension to a certain amount of characters also, than you need to replace the +
var patt1=/\.[0-9a-z]{1,5}$/i;
would allow at least 1 and at most 5 characters after the dot.
Solution 2:[2]
Try
var patt1 = /\.([0-9a-z]+)(?:[\?#]|$)/i;
This RegExp is useful for extracting file extensions from URLs - even ones that have ?foo=1
query strings and #hash
endings.
It will also provide you with the extension as $1
.
var m1 = ("filename-jpg").match(patt1);
alert(m1); // null
var m2 = ("filename#gif").match(patt1);
alert(m2); // null
var m3 = ("filename.png").match(patt1);
alert(m3); // [".png", "png"]
var m4 = ("filename.txt?foo=1").match(patt1);
alert(m4); // [".txt?", "txt"]
var m5 = ("filename.html#hash").match(patt1);
alert(m5); // [".html#", "html"]
P.S. +1 for @stema who offers pretty good advice on some of the RegExp syntax basics involved.
Solution 3:[3]
Example list:
var fileExtensionPattern = /\.([0-9a-z]+)(?=[?#])|(\.)(?:[\w]+)$/gmi
//regex flags -- Global, Multiline, Insensitive
var ma1 = 'css/global.css?v=1.2'.match(fileExtensionPattern)[0];
console.log(ma1);
// returns .css
var ma2 = 'index.html?a=param'.match(fileExtensionPattern)[0];
console.log(ma2);
// returns .html
var ma3 = 'default.aspx?'.match(fileExtensionPattern)[0];
console.log(ma3);
// returns .aspx
var ma4 = 'pages.jsp#firstTab'.match(fileExtensionPattern)[0];
console.log(ma4);
// returns .jsp
var ma5 = 'jquery.min.js'.match(fileExtensionPattern)[0];
console.log(ma5);
// returns .js
var ma6 = 'file.123'.match(fileExtensionPattern)[0];
console.log(ma6);
// returns .123
Solution 4:[4]
ONELINER:
let ext = (filename.match(/\.([^.]*?)(?=\?|#|$)/) || [])[1]
above solution include links. It takes everything between last dot and first "?
" or "#
" char or string end. To ignore "?
" and "#
" characters use /\.([^.]*)$/
. To ignore only "#
" use /\.([^.]*?)(?=\?|$)/
. Examples
function getExtension(filename) {
return (filename.match(/\.([^.]*?)(?=\?|#|$)/) || [])[1];
}
// TEST
[
"abcd.Ef1",
"abcd.efg",
"abcd.efg?aaa&a?a=b#cb",
"abcd.efg#aaa__aa?bb",
"abcd",
"abcdefg?aaa&aa=bb",
"abcdefg#aaa__bb",
].forEach(t=> console.log(`${t.padEnd(21,' ')} -> ${getExtension(t)}`))
Solution 5:[5]
I found this solution on the O'Reilly Regular Expressions Cookbook (chapter 8, section 24). It is case-insensitive and works with .NET, Java, JavaScript, PCRE, Perl, Python & Ruby.
\.[^.\\/:*?"<>|\r\n]+$
A file extension must begin with a dot. Thus, we add ‹.› to match a literal dot at the start of the regex.
Filenames such as Version 2.0.txt may contain multiple dots. The last dot is the one that delimits the extension from the filename. The extension itself should not contain any dots. We specify this in the regex by putting a dot inside the character class. The dot is simply a literal character inside character classes, so we don’t need to escape it. The ‹$› anchor at the end of the regex makes sure we match .txt instead of .0.
If the string ends with a backslash, or with a filename that doesn’t include any dots, the regex won’t match at all. When it does match, it will match the extension, including the dot that delimits the extension and ...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Community |
Solution 2 | Community |
Solution 3 | Yassin Mokni |
Solution 4 | |
Solution 5 | James Moberg |