'Is there a way to dynamically create new arrays from a dataframe

I have a table that looks like

|Category|number|absorbance|protein1|protein2|
|--------|------|----------|--------|--------|
|a|int|float|float|float|
|a|int|float|float|float|
|a|int|float|float|float|
|a|int|float|float|float|
|b|int|float|float|float|
|b|int|float|float|float|
|b|int|float|float|float|
|b|int|float|float|float|
|b|int|float|float|float|
|c|int|float|float|float|
|c|int|float|float|float|
|c|int|float|float|float|
|c|int|float|float|float|
|c|int|float|float|float|

and I want to extract vectors of absorbance, protein1, and protein2 for a,b, and c.

There could be more than 3 categories and each category need not have the same number in another category. Is there something built into dataframes that can extract these subarrays?

I tried dynamically coding this using

julia
for sample_num=1:number_samples
    for i=1:3
        #determine col type
        if i==1
            data_col_name="abs"
        elseif i==2
            data_col_name="B-actin"
        else
            data_col_name="pink-1"
        end
        temp=string(sample_num,"_",data_col_name)
        column_count=4*(sample_num-1)+i
        @eval($:string(sample_num,data_col_name)=$file_contents[:,column_count])
        #string_as_varname(temp,file_contents[:,column_count])
    end
end

But I got error in method definition:

function Base.string must be explicitly imported to be extended
top-level scope@none:0
top-level scope@none:1
[email protected]:360[inlined]
top-level scope@Local: 13[inlined]


Solution 1:[1]

I am not fully clear what you need exactly, but most likely what you are looking for is GroupedDataFrame.

You start with creating it:

julia> using DataFrames

julia> df = DataFrame(id=["a", "a", "b", "b"], col1=1:4, col2=11:14)
4×3 DataFrame
 Row ? id      col1   col2
     ? String  Int64  Int64
????????????????????????????
   1 ? a           1     11
   2 ? a           2     12
   3 ? b           3     13
   4 ? b           4     14

julia> gdf = groupby(df, :id)
GroupedDataFrame with 2 groups based on key: id
First Group (2 rows): id = "a"
 Row ? id      col1   col2
     ? String  Int64  Int64
????????????????????????????
   1 ? a           1     11
   2 ? a           2     12
?
Last Group (2 rows): id = "b"
 Row ? id      col1   col2
     ? String  Int64  Int64
????????????????????????????
   1 ? b           3     13
   2 ? b           4     14

Now to get column :col1 from group "a" you can just write:

julia> gdf[("a",)].col1
2-element view(::Vector{Int64}, [1, 2]) with eltype Int64:
 1
 2

You can also define the group value and column name programmatically e.g.:

julia> group = "b"
"b"

julia> column = "col2"
"col2"

julia> gdf[(group,)][:, column]
2-element Vector{Int64}:
 13
 14

EDIT

How to get a dictionary mapping (key, column) pairs into subvectors:

julia> Dict((key[1], col) => gdf[key][:, col] for key in keys(gdf), col in valuecols(gdf))
Dict{Tuple{String, Symbol}, Vector{Int64}} with 4 entries:
  ("a", :col2) => [11, 12]
  ("b", :col2) => [13, 14]
  ("a", :col1) => [1, 2]
  ("b", :col1) => [3, 4]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1