'In C is it faster to use the standard library or write your own function?

For example, in <ctype.h> there are functions like isalpha().

I want to know if writing an isalpha function on my own is faster than calling isalpha?


Thanks for all your instant replies! just want to make clearer to my question:

so even for the isalpha function? because you can simply pass a character and check if the character is between 'a' and 'z' || 'A' and 'Z'?

another question: when you include a std library like ctype.h and just call one function like isalpha, will the file(I mean all lines of code) be loaded? My concern is that the big-size will make program slower



Solution 1:[1]

Unless you have specific reason to do so (e.g., you have a specific requirement not to use the standard library or you've profiled a very specific use case where you can write a function that performs better), you should always prefer to use a standard library function where one exists rather than writing your own function.

The standard library functions are heavily optimized and very well tested. In addition, the standard library that ships with your compiler can take advantage of compiler intrinsics and other low-level details that you can't portably use in your own code.

Solution 2:[2]

isalpha does not merely check if its argument is in the ranges A-Z, a-z. Quoting the C standard (ยง7.4.1.2):

The isalpha function tests for any character for which isupper or islower is true, or any character that is one of a locale-specific set of alphabetic characters for which none of iscntrl, isdigit, ispunct, or isspace is true.

In all likelihood you can write a more limited version (as you suggest) that is faster for the subset of cases that it handles, but it won't be the isalpha function. Library routines exist not only to be efficient, but to be complete and correct. Efficiency actually turns out to be the easy part; getting all the edge cases right is where the hard work comes in.


Note also, if you're going to write an optimized version that targets English/ASCII, you can do it rather more efficiently then what you suggested, either with the lookup table that someone else suggested, or my personal preference (edited to fix an error caught by R..)

int isalpha(int c) {
    return ((unsigned int)(c | 32) - 97) < 26U;
}

Solution 3:[3]

Generally, you should always use the C libraries when possible. One real reason not to is when you are in an embedded environment and are EXTREMELY space limited (which is usually not the case, and virtually all embedded platforms provide C libraries for the platform).

An example may be that using the isalpha function may actually drag in an object file containing all of the is... functions and you don't need any of them (the object file is the typical minimum unit when linking although some linkers can go down to individual functions).

By writing your own isalpha, you can ensure that it, and only it, is incorporated into your final binary.

In some limited cases you may get faster speeds where you have a very specific thing you are wanting to do and the library is handling a more general case. Again, only needed if a particular loop is a bottleneck in the system. You may also want to choose a different speed/space tradeoff than that chosen by the library writer, an example being changing:

int isalpha (int c) {
    return ((c >= 'A') && (c <= 'Z')) || ((c >= 'a') && (c <= 'z'));
}

into:

int isalpha (int c) {
    static int map[256] = {0,0,0,0,...,1,1,1,...,0,0,0};
    return map[c & 0xff];
}

a (potentially) faster implementation at the cost of extra storage for the map (and you need to understand your execution environment since it's not portable).

Another reason not to use them is to provide a more secure way with dealing with things like strings where security/robustness is a CRITICAL factor. This will generally cost you a lot more time to prove the correctness.

Solution 4:[4]

The standard library functions are written by, presumably, very smart people, and have been thoroughly vetted, debugged and optimized. They've been tested perhaps millions of times over in every conceivable production environment. Chances are very good that your custom function will not be better or faster.

Solution 5:[5]

There are already a bunch of answers here, but none except Stephen Canon's address the most important part: the different semantics. This is the most important factor in choosing which functions to use.

The standard C library isalpha etc. functions are specified to work according to the current locale. If you leave the locale as the default "C" locale (by failing to call setlocale), they have very predictable behavior, but this precludes using the only standardized method for the application to detect and use the system's/user's preferred character encoding, number formatting, message language, and other localization preferences.

On the other hand, if you implement your own isalpha (the optimal implementation is ((unsigned)c|32)-'a'<26 or if you like code that's more self-documenting, ((unsigned)c|('A'^'a')-'a'<='z'-'a'), it always has very predictable behavior regardless of locale.

I would go so far as to attach a considered harmful to using the standard isalpha, etc. functions for anything except naive text processing of text assumed to be in the user's locale format. These functions are especially unsuited for parsing configuration files, text-based network transactions, HTML, programming language sources, etc. (The one exception is isdigit which ISO C requires to be equivalent to return (unsigned)c-'0'<10;.) On the other end of the spectrum, if you're writing an application with advanced natural language text handling (like a word processor or web browser) it will need to have much more advanced character property handling than the C library can provide and you should be looking for a good Unicode library.

Solution 6:[6]

Although it's most likely not going to be slower if you're careful writing it, I can almost guarantee that you're not going to make something more optimized than what's already there. The only case I can think of is if its a function and you do it repeatedly inline - but if its a macro, you're not going to beat it. Use the standard.

Solution 7:[7]

in many C/C++ enviroments (eg VisualC) the source of the 'C Runtime Library' (CRT) is available. Look at the code in the function CRT and then try to think "can you make it better?".

Solution 8:[8]

The only time I do not use something in the standard library is where that something is missing unless a particular extension of that library is turned on.

For instance, to get asprintf() in GNU C, you need to turn on _GNU_SOURCE prior to including <stdio.h>. There was even a time when strdup() was hit or miss in <string.h>.

If I heavily depend on these extensions, then I try to include them in my code base so that I don't have to write kludges to work around their absence.

Then there are the rare instances when you want to roll your own version of something that gives, for example, POSIX behavior (or something else) by default in a better way.

Other than that, implementing something from stdc on your own seems a little bit silly, beyond the value of being a good learning exercise.

Solution 9:[9]

Interestingly, the implementation of isalpha() in your question is slower that the most common implementation provided with standard C libraries 30 years ago. Remember, this is a function that will be used at a critical inner loop of your average C compiler. :)

I'll admit that current library implementations are probably marginally slower than they used to be, because of character set issues that we have to deal with today.

Solution 10:[10]

A few lines or one line of C code does not necessarily translate into the simplest and fastest solution. A memcpy() of while(--len) *d++ = *s++; is definitely the slowest. The libraries are generally well done and fast, and you may have a hard time improving upon them. Places where you may see a gain are on specific platforms where you know something about the platform that the compiler doesn't. For example, the target may be a 32-bit processor but you may know that 64 bit aligned accesses are faster and may want to modify the library to take advantage of that special situation. But in general for all platforms for all targets, you likely will not do better, target specific optimizations have been written for popular targets and are in the C library for the popular compilers.

Solution 11:[11]

This may be a bad answer, but in terms of efficiency, I've seen some pretty bad implementations in standard libraries. I swore loudly just yesterday when I saw that on my Mac, the standard C library has a unelidable branch, a table lookup, and bit operations to extract the rune information for almost all <ctype.h> operations. As other's have noted, performance is typically the last step of software engineering unless you're product has specific requirements. I, myself, work as a performance engineer and thus am overly focused on performance, proprietary implementations like that below for all projects. This bridges gaps for bad standard library implementations, and for embedded systems where the standard library may not even be present. Such embedded systems though, almost no one works with, so that's why taking such proprietary approaches are typically unreasonable, or a waste of time.

An example in the wild: If you browse GCC source code you may see functions like ISALPHA, which are actually macros that hide non-standard implementations for performance by ignoring locale. Their style guide recommends using these whenever the code is analyzing characters in a more 'bytewise' fashion, (if that makes sense).

Example of better implementations:

// I'm bad at naming
bool isalpha_en_us(char c)
{
    switch (c) {
    case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g': 
    case 'h': case 'i': case 'j': case 'k': case 'l': case 'm': case 'n': 
    case 'o': case 'p': case 'q': case 'r': case 's': case 't': case 'u': 
    case 'v': case 'w': case 'x': case 'y': case 'z': case 'A': case 'B': 
    case 'C': case 'D': case 'E': case 'F': case 'G': case 'H': case 'I': 
    case 'J': case 'K': case 'L': case 'M': case 'N': case 'O': case 'P': 
    case 'Q': case 'R': case 'S': case 'T': case 'U': case 'V': case 'W': 
    case 'X': case 'Y': case 'Z':
        return true;
    default:
        return false;
    }
}

Switch statements optimize better in my experience.

It may be ugly, but you only have to write it once.