'How to get TONE from GDELT GKG table from Google BigQuery?
SELECT
DATE,
EXTRACT(YEAR FROM DATE) AS year,
FIPS as Country,
LOCATIONS,
AVG(TONE) as Avg_Tone,
AVG(Positive Score) as PositiveS,
AVG(Negative Score) as NegativeS,
COUNT(*),
From `gdelt-bq.gdeltv2.gkg_partitioned`,
`gdelt-bq.extra.sourcesbycountry` country,
Where
DATE(_PARTITIONTIME) BETWEEN TIMESTAMP('2002-01-01') AND TIMESTAMP('2020-12-31')
AND SourceCommonName=country.Domain
AND Location like '%CH%'
GROUP BY Year,Country
ORDER BY Year,Country
Codebook link is http://data.gdeltproject.org/documentation/GDELT-Global_Knowledge_Graph_Codebook-V2.1.pdf The V1.5TONE has TONE, Positive Score and Negative Score and so on. I want to count the average Tone by year. How to get it from Big Query.
Solution 1:[1]
There is the need to cast and split the fields first.
The
date
is a value formated as "yyyymmdd....". Therefore, I suggest casting the value as a string and consider the first four characters as the year.There is no
V1.5TONE
column, butV2Tone
. It consists of a string with several digital numbers seperated by a comma. The string has to be split first. Then each component needs to be casted a decimal number.The table
gdelt-bq.extra.sourcesbycountry
should map an url to a single country. It has duplicated countries to one url! To eliminate at least some duplicated value the innerselect
with the group by is used.To obtain the values grouped by Year,Country all other dimensional columns need to be commented out.
SELECT
#DATE,
substr(cast(date as string),0,4) AS year,
FIPS as Country,
#LOCATIONS,
AVG(cast(split(V2Tone,",")[safe_offset(0)] as decimal )) as Avg_Tone,
AVG(cast(split(V2Tone,",")[safe_offset(1)] as decimal )) as PositiveS,
AVG(cast(split(V2Tone,",")[safe_offset(2)] as decimal )) as NegativeS,
COUNT(*) as counts,
From `gdelt-bq.gdeltv2.gkg_partitioned`
left join
(SELECT Domain, FIPS
from `gdelt-bq.extra.sourcesbycountry`
group by 1,2) country
on SourceCommonName=country.Domain
Where
DATE(_PARTITIONTIME) BETWEEN DATE('2020-01-01') AND DATE('2020-01-31')
AND Locations like '%CH%'
GROUP BY Year,Country
ORDER BY Year,Country
Also in where
segment: Do not mix DATE and TIMESTAMP.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Samuel |