'Stage level data is not coming for bigquery running jobs through java bigquery libraries

I am using com.google.cloud.bigquery library for fetching the job level details. We have the following code snippets

Job job =  getBigQuery(projectId, location).getJob(JobId.newBuilder().setJob("myJobId").
                setLocation(location).setProject(projectId).build());


   private BigQuery getBigQuery(String projectId, String location) throws IOException {
    // path to your credentials file
    String credentialsPath = "my private key crdentials file";
    BigQuery bigQuery;

    bigQuery = BigQueryOptions.newBuilder().setProjectId(projectId).setLocation(location)
            .setCredentials(GoogleCredentials.fromStream(new FileInputStream(credentialsPath))).build()
            .getService();
    return bigQuery;
}

My Dependency

    <dependency>
        <groupId>com.google.cloud</groupId>
        <artifactId>google-cloud-bigquery</artifactId>
        <version>2.10.0</version>
    </dependency>

Now for completed jobs, I have no issue, but for some jobs which are in a running state like having a duration of more than 1 minute, we are getting the incomplete query plan data which is ultimately giving the null pointer exception.

enter image description here

If we observe the picture, for the job, there is jobStatistics part, there it is giving the warning like it will throw java.lang.NullPointerException .

Now the main issue is, in our processing, when we check the queryPlan field, it is not null and it is showing the size of some number. When I try to process that in any loop, iterator, stream it is throwing the NullPointerException.

When I try to fetch the data for the same running job using API, it is giving complete details.

Ultimately the conclusion is why the bigquery is giving different results for the java library and API, why there is incompleteness in the java library side(I have tried by updating the dependency version also). What is the solution for me, how can I prevent my code from going into the NullPointerException.

enter image description here

Ultimately the library is also using the same API, but somehow in the internal processing the query plan data is not getting generated properly when the job is in running state.



Solution 1:[1]

I was able to test the behaviour of the code as well as the API. When the query is running, most of the API response fields under queryPlan are 0, therefore not complete. Only when the query has completed its execution, the queryPlan field shows the complete information.

Also, as per this client library documentation, the queryPlan is available only once the query has completed its execution. So, the NullPointerException is the expected behaviour when the query is still running (tested this as well).

To prevent the NullPointerException, you might have to access the queryPlan when the state of the query is DONE.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kabilan Mohanraj