'Dataproc YARN container logs location
i'm aware of the existence of this thread:
where are the individual dataproc spark logs?
However if i ssh connect to a worker node vm and navigate to the /tmp folder this is all i see:
Is anyone able to pinpoint me to the exact location?
also for some reason i can't navigate directly from UI to stdout/stderr of the single task as it says that i'm unable to reach the site whenever i try to access the logs from the link in the UI
Solution 1:[1]
The previous answer looks to be outdated.
If you are talking about the container logs, then:
- On clusters with a 1.5 or newer image, Yarn log aggregation is enabled by default and the remote log directory is set to be the temp bucket for the cluster. You can look the location up under
/etc/hadoop/conf/yarn-site.xml
, and the configuration isyarn.nodemanager.remote-app-log-dir
. - On clusters with a 1.4 or older image, log aggregation is not enabled by default, so the container logs will be under
/var/log/hadoop-yarn/userlogs
on the worker nodes where the containers were run.
Solution 2:[2]
In Dataproc 1.4 or older versions, the yarn.log-aggregation-enable
property in /etc/hadoop/conf/yarn-site.xml
is set to fasle
by default, and the container logs are controlled by the yarn.nodemanager.log-dirs
property which is set to /var/log/hadoop-yarn/userlogs
by default.
In 1.5 or newer versions, the yarn.log-aggregation-enable
property in /etc/hadoop/conf/yarn-site.xml
is set to true
by default, and the container logs are controlled by the yarn.nodemanager.remote-app-log-dir
property which is set to gs://<cluster-tmp-bucket>/<cluster-uuid>/yarn-logs
by default. Check this doc for more details on Dataproc tmp bucket.
In addition to dump the logs at the location, there are several other ways to view the logs:
YARN CLI: If the cluster has not been deleted, SSH into the master node, then run
yarn logs -applicationId <app-id>
. If you are not sure about the app ID, runyarn application -list -appStates ALL
to list all apps. This method works only when log aggregation is enabled.YARN Application Timeline server: If you enabled Component Gateway and the cluster has not been deleted, open the cluster's "YARN Application Timeline" link in the "WEB INTERFACES" tab of the cluster's web UI, find the application attemp and its containers, click the "Logs" link. This method works only when log aggregation is enabled.
Cloud Logging: YARN container logs are available in Cloud Logging even after the cluster is deleted.
3.1) When
dataproc:dataproc.logging.stackdriver.job.yarn.container.enable
iffalse
(which is the default) or the job is submitted through CLI e.g.,spark-submit
instead of Dataproc jobs API , it is under theprojects/<project-id>/logs/yarn-userlogs
log name of the cluster resource:resource.type="cloud_dataproc_cluster" resource.labels.cluster_name=<cluster-name> resource.labels.cluster_uuid=<cluster-uuid> log_name="projects/<project-id>/logs/yarn-userlogs"
3.2) When
dataproc:dataproc.logging.stackdriver.job.yarn.container.enable
iftrue
, it is under theprojects/<project-id>/logs/dataproc.job.yarn.container
log name of the job resource:resource.type="cloud_dataproc_job" resource.labels.job_id=<job_id> resource.labels.job_uuid=<job_uuid> log_name="projects/<project-id>/logs/dataproc.job.yarn.container"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | cyxxy |
Solution 2 |