'Pentaho Kettle: Dynamic Parameters and Query
I am trying to insert a data to destinationDB from a dynamic datasource as well as its query in getting the data. That said the columns and table to insert may vary. But, the thing is that the columns and tables are saved as rows in a table at the destinationDB. That means though it is dynamic, I already have an idea to where to insert it and what are its columns. So, yeah that's the back story of my problem.
Anyway, the problem is on the insert side. I was able to generate SELECT statement to get the datasource and did get the data, by manipulating the data to get the query.
This is what it looks like now. Do mind the get variables, that is how I generate the sql queries.
This is what it looks like now. Do mind the get variables, that is how I generate the sql queries.
Inside the Table input looks like this:
Here is the Execute SQL Statement:
See the dynamic part? anyway, the thing is that the ${INSERT_VALUES} in the insert script variables are all '?' character which the data is suppose to be provided by the table input. For me to identify how many columns.
Problem 1: As you can notice in the 'Execute SQL Statement' step I did not pass any parameter to provide the '?' on the insert. That is because I do not know what to tag there because the parameter/columns vary everytime. Depends on what table to insert. Is there a way to approach this? Problem 2: Does pentaho allow a sort of like double substituion? substitute the variable to its value and substitute the '?' to the parameter value?
Solution 1:[1]
If I understood your problem correctly, everytime you run you are selecting a specific column, lets say colA
, from a given table, tableX
and you will want to insert it into tableY
a row that looks like this:
tableX, colA, valueA
Is that it?
Well, in that case, a few notes:
1) You're not really using the output of the Get Variables step in the Table Input step. In the Table Input you're reading the variable values, not the fields that are the output of the Get Variables step.
2) You shouldn't use the Execute Row SQL Script step here. A Table Output step is a better option.
Here's a better recipe:
1) Get Variables: it should get the following variables and create the following output fields: ${SQL_STR} maps to sql_str ${SELECT_COLUMN} to select_column
2) Table input: your query should be something like
Select
? as table,
? as column,
${SELECT_COLUMN} as value
From (
${SQL_STR}
) foo
The output of your Table Input step is made of 3 fields:
The string value of ${SQL_STR}, the string value of ${SELECT_COLUMN} and the actual value that comes from the source table.
3) A table output step writes to table ${DATA_TABLE_SQL}, with the following field mapping:
table -> dp_table_relation_id column -> whatever column name you have on the target table for column names value -> whatever column name you have on the target table for values.
Remark: I didn't include the id, but it shouldn't be hard to either set it as an auto_increment field on the target table, or in a separate query you get the max already stored in the target table, add a sequence to the output of your table input step, add those two integers, and use the result as your new id.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | nsousa |