'Printing all phrases in a file with C program
I need to print all phrases from a file (phrases can end in '.', '?' or '!')
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char* read_file(char *name) {
FILE *file;
char *text;
long num_bytes;
file = fopen(name, "r");
if(!file) {
printf("File could not be opened!");
exit(EXIT_FAILURE);
}
fseek(file, 0, SEEK_END);
num_bytes = ftell(file);
fseek(file, 0, SEEK_SET);
text = (char*) malloc(num_bytes * sizeof(char));
fread(text, 1, num_bytes, file);
fclose(file);
return text;
}
I have this piece of code that kind of works but if my file as the following text: "My name is Maria. I'm 19." the second phrase is printed with a ' ' in the beggining. Can someone please help finding a way to ignore those spaces? Thank you
Solution 1:[1]
To start, you have several problems that will invoke Undefined Behaviour. In
char *line = (char*) malloc(sizeof(text));
sizeof (text)
is the size of a pointer (char *
), not the length of the buffer it points to.
sizeof (char *)
depends on your system, but is very likely to be 8
(go ahead and test this: printf("%zu\n", sizeof (char *));
, if you are curious), which means line
can hold a string of length 7
(plus the null-terminating byte).
Long sentences will easily overflow this buffer, leading to UB.
(Aside: do not cast the return of malloc
in C.)
Additionally, strlen(text)
may not work properly as text
may not include the null-terminating byte ('\0'
). fread
works with raw bytes, and does not understand the concept of a null-terminated string - files do not have to be null-terminated, and fread
will not null-terminate buffers for you.
You should allocate one additional byte to in the read_file
function
text = malloc(num_bytes + 1);
text[num_bytes] = 0;
and place the null-terminating byte there.
(Aside: sizeof (char)
is guaranteed to be 1
.)
Note that ftell
to determine the length of a file should not be relied upon.
isspace
from <ctype.h>
can be used to determine if the current character is whitespace. Its argument should be cast to unsigned char
. Note this will include characters such as '\t'
and '\n'
. Use simple comparison if you only care about spaces (text[i + 1] == ' '
).
A loop can be used to consume the trailing whitespace after matching a delimiter.
Make sure to null-terminate line
before printing it, as %s
expects a string.
Use %u
to print an unsigned int
.
Do not forget to free
your dynamically allocated memory when you are done with it. Additionally, heavily consider checking any library function that can fail has not done so.
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void pdie(const char *msg) {
perror(msg);
exit(EXIT_FAILURE);
}
char *read_file(char *name) {
FILE *file = fopen(name, "r");
if (!file)
pdie(name);
fseek(file, 0, SEEK_END);
long num_bytes = ftell(file);
if (-1 == num_bytes)
pdie(name);
fseek(file, 0, SEEK_SET);
char *text = malloc(num_bytes + 1);
if (!text)
pdie("malloc");
if (-1 == num_bytes)
pdie(name);
text[num_bytes] = 0;
if (fread(text, 1, num_bytes, file) != num_bytes)
pdie(name);
fclose(file);
return text;
}
int main(int argc, char **argv) {
if (argc < 2) {
fprintf(stderr, "usage: %s TEXT_FILE\n", argv[0]);
return EXIT_FAILURE;
}
char *text = read_file(argv[1]);
unsigned int count = 0;
size_t length = strlen(text);
size_t index = 0;
char *line = malloc(length + 1);
if (!line)
pdie("malloc");
for (size_t i = 0; i < length; i++) {
line[index++] = text[i];
if (text[i] == '.' || text[i] == '?' || text[i] == '!') {
line[index] = '\0';
index = 0;
printf("[%u] <<%s>>\n", ++count, line);
while (isspace((unsigned char) text[i + 1]))
i++;
}
}
free(text);
free(line);
return EXIT_SUCCESS;
}
Input file:
My name is Maria. I'm 19. Hello world! How are you?
stdout
:
[1] <<My name is Maria.>>
[2] <<I'm 19.>>
[3] <<Hello world!>>
[4] <<How are you?>>
Solution 2:[2]
You can test for a whitespace character by comparing the char in question to ' '.
if(text[i] == ' ')
// text[i] is whitespace
Solution 3:[3]
One possible solution, advance to the next non-whitespace character when you find the end of the sentence. You also need to make sure you've malloc
d enough memory for the current phrase:
#include <ctype.h> // for isspace
...
size_t textLength = strlen(text);
// malloc based on the text length here, plus 1 for the NUL terminator.
// sizeof(text) gives you the size of the pointer, not the size of the
// memory block it points to.
char *line = malloc(textLength+1);
for(size_t i = 0; i < textLength; i++) {
line[index] = text[i];
index++;
if(text[i] == '.' || text[i] == '?' || text[i] == '!') {
count++;
printf("[%d] %s\n", count, line);
memset(line, 0, index + 1);
index = 0;
// advance to the next non-whitespace char
do
{
// advance to the next char (we know the current char is not a space)
i++;
// keep advancing i while the next char is in range of the
// text and the next char is a space.
}while (i+1 < textLength && isspace(text[i+1]) != 0);
}
}
Output:
[1] My name is Maria.
[2] I'm 19.
There's also no need to cast the return value of malloc
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | chameleon |
Solution 3 |