'How to avoid zombie processes when running a Command?

A small Iron project calls a Command in some route and returns a Response. Here is the relevant code of the route handler function:

fn convert(req: &mut Request) -> IronResult<Response> {

    // ...
    // init some bindings like destination_html and destination_pdf
    // ...

    convert_to_pdf(destination_html, destination_pdf);

    Ok( Response::with((status::Ok, "Done")) )
}

And the code of the called function:

fn convert_to_pdf(destination_html: &str, destination_pdf: &str) {
    Command::new("xvfb-run")
        .arg("-a")
        .arg("wkhtmltopdf")
        .arg(destination_html)
        .arg(destination_pdf)
        .stdout(Stdio::null())
        .stderr(Stdio::null())
        .spawn()
        .expect("failed to execute process");
}

The process works (the file is converted from HTML to PDF) and the response is returned to the browser. Everything is fine, but a zombie process is still there as a child of my app:

enter image description here

I don't know why and I don't know how to avoid it. What could I do?

The wkhtmltopdf command is a long process, I don't want to call it synchronously and wait for its return. And I don't want to restart my Rust program (the parent of the zombie child) twice a day to kill zombies.



Solution 1:[1]

Your problem is that you are not waiting for the process termination, so the operating system is not releasing any resources (see the man pages for proper explanation). Your zombies are taking memory, which will result in resource exhaustion. Killing the parent process will not do anything, you need to kill each zombie manually (if you were running wkhtmltopdf within a thread, it would work).


Beyond that...

You are trying to spawn a command and answer your clients ... without even checking the status code of wkhtmltopdf. Moreover, you are running as root, which is A BAD PRACTICE (whether you are developing as root or not). And your application is susceptible to DDoS (if you have a lot of clients generating PDFs, your server will face resource exhaustion).

(IMHO) You should break your project into two :

  1. the server without the rendering process
  2. the PDF rendering engine

The first would send a message to the second "please generate a PDF with the following parameters(..)". The second would look at the messages queue, take the first, generate the PDF and wait for completion/errors. You could even add a unique #ID to the message, and create an endpoint on the rendering engine to actually query for the status of job #ID.

What you are trying to do is a job queue like Celery, but it is written in Python and is using third-party software (Redis).

Solution 2:[2]

Instead use std::process::Command, you can use tokio::process::Command with option kill_on_drop(true).

Controls whether a kill operation should be invoked on a spawned child process when its corresponding Child handle is dropped.

By default, this value is assumed to be false, meaning the next spawned process will not be killed on drop, similar to the behavior of the standard library.

Caveats

On Unix platforms processes must be “reaped” by their parent process after they have exited in order to release all OS resources. A child process which has exited, but has not yet been reaped by its parent is considered a “zombie” process. Such processes continue to count against limits imposed by the system, and having too many zombie processes present can prevent additional processes from being spawned.

tokio docs: https://docs.rs/tokio/latest/tokio/process/struct.Command.html#method.kill_on_drop

Solution 3:[3]

It's ugly, I hate it, it works!

use std::{thread, time};


let _ = thread::spawn(|| {
    // Cull zombies every minute in the background
    loop {
        let minute = time::Duration::from_secs(60);
        thread::sleep(minute);
        println!("Culling Zombies");
        // 99999 FreeBSD
        // cat /proc/sys/kernel/pid_max Linux
        for pid in 1..99999 {
            let _ = nix::sys::wait::waitpid(nix::unistd::Pid::from_raw(pid as i32), Some(nix::sys::wait::WaitPidFlag::WNOHANG));
        }
    }
});

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Shepmaster
Solution 2 unpluggedcoder
Solution 3 Robert Waksmunski