'How to use a df column in a vertica_python SQL query?
I have a dataframe with names that I set to a dictionary, like this:
{1: "Bob",
41: "John",
126: "Jim",
167: "Pete"}
I am using Vertica. I want to be able to pass those names into my query. After connecting my Vertica connection as outlined in the docs, I'm stuck now. My query is:
SELECT COUNT(1)
FROM my_table
WHERE name = {name}
I have the values from that dictionary in a df, so ideally the return would be in the new_value column like so:
id name new_value
1 Bob 2
2 John 47
3 Jim 22
4 Pete 13
I'm not sure how to add that name column into this vertica_python query, and have it return to the df after getting the count. Any help would be super appreciated!
Solution 1:[1]
I'm not a Python programmer - "if you can't do it in SQL, do it in C; if you can't do it in C, it's not worth doing" :-]]
But the query you need to generate is - in plain SQL:
SELECT
name
, COUNT(*) AS new_value
FROM my_table
WHERE name IN(
'Bob'
, 'John'
, 'Jim'
, 'Pete'
)
GROUP BY
name
;
This way, you fire one query to get all the answers.
If, from your comment, you have 10,000 and more "names", then go:
CREATE TABLE searchtable (
name VARCHAR(64) -- hope it is long enough ...
) UNSEGMENTED ALL NODES;
Then, I'm sure there is a way to write a Python data frame to an existing Vertica table.
Once the table is populated, go:
SELECT
my_table.name
, COUNT(*) AS new_value
FROM my_table
JOIN searchtable USING(name)
GROUP BY
my_table.name
;
... to get the results, and pull that back into a new data frame.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |