'How to use a df column in a vertica_python SQL query?

I have a dataframe with names that I set to a dictionary, like this:

{1: "Bob",
41: "John",
126: "Jim",
167: "Pete"}

I am using Vertica. I want to be able to pass those names into my query. After connecting my Vertica connection as outlined in the docs, I'm stuck now. My query is:

SELECT COUNT(1)
FROM my_table
WHERE name = {name}

I have the values from that dictionary in a df, so ideally the return would be in the new_value column like so:

id name new_value
 1 Bob    2
 2 John   47
 3 Jim    22
 4 Pete   13

I'm not sure how to add that name column into this vertica_python query, and have it return to the df after getting the count. Any help would be super appreciated!



Solution 1:[1]

I'm not a Python programmer - "if you can't do it in SQL, do it in C; if you can't do it in C, it's not worth doing" :-]]

But the query you need to generate is - in plain SQL:

SELECT
  name
, COUNT(*) AS new_value
FROM my_table
WHERE name IN(
  'Bob'
, 'John'
, 'Jim'
, 'Pete'
)
GROUP BY
  name
;

This way, you fire one query to get all the answers.

If, from your comment, you have 10,000 and more "names", then go:

CREATE TABLE searchtable (
  name VARCHAR(64) -- hope it is long enough ...
) UNSEGMENTED ALL NODES;

Then, I'm sure there is a way to write a Python data frame to an existing Vertica table.

Once the table is populated, go:

SELECT
  my_table.name
, COUNT(*) AS new_value
FROM my_table
JOIN searchtable USING(name)
GROUP BY
  my_table.name
;

... to get the results, and pull that back into a new data frame.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1