'Generic Method to Flatten a Collection of Nested Objects into a DataTable?
I have a list of objects, which in turn contain nested lists of further objects. I would like to flatten the object graph into a DataTable
.
I found code which takes a collection of objects and maps them to a DataTable
(referenced below), but it assumes that properties are simple types that can reliably be converted to a string value.
I am thinking this is only possible through recursion, but perhaps there are better ways to do this.
Data Model
Imagine that we have a List
of Customer
objects:
public class Item
{
public string SKU { get; set; }
public string Description { get; set; }
public double Price { get; set; }
}
public class Order
{
public string ID { get; set; }
public List<Item> Items { get; set; }
}
public class Customer
{
public string Name { get; set; }
public string Email { get; set; }
public List<Order> Orders { get; set; }
}
I would like to fully flatten the collection into a single DataTable
with the following DataColumn
s:
Customer.Name
Customer.Email
Customer.Order.ID
Customer.Order.Item.SKU
Customer.Order.Item.Description
Customer.Order.Item.Price
Sample Implementation
The following is a sample implementation found elsewhere on Stack Overflow, but this will only work if the object exclusively contains simple properties, such as primitives or strings, and not other nested objects. I added a comment in the function where I think we can apply recursion, but I'm not entirely sure it will work.
public static DataTable CreateDataTableFromAnyCollection<T>(IEnumerable<T> list)
{
Type type = typeof(T);
var properties = type.GetProperties();
DataTable dataTable = new DataTable();
foreach (PropertyInfo info in properties)
{
dataTable.Columns.Add(new DataColumn(info.Name, Nullable.GetUnderlyingType(info.PropertyType) ?? info.PropertyType));
}
foreach (T entity in list)
{
object[] values = new object[properties.Length];
for (int i = 0; i < properties.Length; i++)
{
values[i] = properties[i].GetValue(entity,null); // if not primitive pass a recursive call
}
dataTable.Rows.Add(values);
}
return dataTable;
}
Solution 1:[1]
If you're only working with one type of model object (in this case, Customer
), then I recommend @Huntbook's answer, as that dramatically simplifies this problem.
That said, if you truly need for this to be a generic method because e.g., you'll be handling a variety of different model objects (i.e., not exclusively Customer
), then you can certainly expand your proposed CreateDataTableFromAnyCollection<T>()
method to support recursion, though it isn't a trivial task.
Approach
The recursion process isn't quite as straight forward as you might expect since you're looping through a collection of objects, yet only need to determine the definition of the DataTable
once.
As a result, it makes more sense to separate out your recursive functionality into two separate methods: One for establishing the schema, the other for populating the DataTable
. I propose:
EstablishDataTableFromType()
, which dynamically establishes the schema of theDataTable
based on a givenType
(along with any nested types), andGetValuesFromObject()
, which, for each (nested) object in your source list, adds the values from each property to a list of values, which can subsequently be added to aDataTable
.
Challenges
The above approach glosses over a number of challenges introduced when working with complex objects and collections. These include:
How do we determine if a property is a collection—and, thus, subject to recursion? We will be able to use the
Type.GetInterfaces()
andType.GetGenericTypeDefinition()
methods to identify if a type implementsICollection<>
. I implement this in the privateIsList()
method below.If it is a collection, how do we determine what type the collection contains (e.g.,
Order
,Item
)? We will be able to use theType.GetGenericArguments()
to determine what the generic type argument for theICollection<>
is. I implement this in the privateGetListType()
method below.How do we ensure all data is represented, given that each nested item necessitates an additional row? We will need to establish a new record for every permutation in the object graph.
What happens if you have two collections on an object, as per @DBC's question in the comments? My code assumes you'll want a permutation for each. So if you added
Addresses
toCustomer
, this might yield something like:Customer.Name … Customer.Orders.ID Customer.Orders.Items.SKU Customer.Addresses.PostalCode Bill Gates … 0 001 98052 Bill Gates … 0 002 98052 Bill Gates … 0 001 98039 Bill Gates … 0 002 98039 What happens if an object has two collections of the same
Type
? Your proposal infers that theDataColumn
names should be delineated by theType
, but that would introduce a naming conflict. To address that, I assume the propertyName
should be used as the delineator, not the propertyType
. E.g., in your sample model, theDataColumn
will beCustomer.Orders.Items.SKU
, notCustomer.Order.Item.SKU
.How do you differentiate between complex objects and "primitive" objects? Or, more accurately, objects that can be reliably serialized to a meaningful value? Your question assumes that collection properties will contain complex objects and other properties won't, but that's not necessarily true. E.g., a property that points to a complex object, or, conversely, a collection that contains simple objects:
public class Order { public List<string> CouponCodes { get; set; } = new(); public Address ShipTo { get; set; } }
To address this, I rely on @julealgon's answer to How do I tell if a type is a "simple" type? i.e. holds a single value. I implement this in the private
IsSimple()
method below.
Solution
The solution to this is considerably more complicated than the sample code you referenced. I'll provide a brief summary of each method below. In addition, I've included XML Docs and some comments within the code. If you have questions about any specific functionality, however, please ask and I'll provide further clarification.
EstablishDataTableFromType()
: This method will establish a DataTable
definition based on a given Type
. Instead of simply looping through values, however, this method will recurse over any complex types discovered—including those contained within collections.
/// <summary>
/// Populates a <paramref name="dataTable"/> with <see cref="DataColumn"/>
/// definitions based on a given <paramref name="type"/>. Optionally prefixes
/// the <see cref="DataColumn"/> name with a <paramref name="prefix"/> to
/// handle nested types.
/// </summary>
/// <param name="type">
/// The <see cref="Type"/> to derive the <see cref="DataColumn"/> definitions
/// from, based on properties.
/// </param>
/// <param name="dataTable">
/// The <see cref="DataTable"/> to add the <see cref="DataColumn"/>s to.
/// </param>
/// <param name="prefix">
/// The prefix to prepend to the <see cref="DataColumn"/> name.
/// </param>
private static void EstablishDataTableFromType(Type type, DataTable dataTable, string prefix = "") {
var properties = type.GetProperties();
foreach (System.Reflection.PropertyInfo property in properties)
{
// Handle properties that can be meaningfully converted to a string
if (IsSimple(property.PropertyType))
{
dataTable.Columns.Add(
new DataColumn(
prefix + property.Name,
Nullable.GetUnderlyingType(property.PropertyType)?? property.PropertyType
)
);
}
// Handle collections
else if (IsList(property.PropertyType))
{
// If the property is a generic list, detect the generic type used
// for that list
var listType = GetListType(property.PropertyType);
// Recursively call this method in order to define columns for
// nested types
EstablishDataTableFromType(listType, dataTable, prefix + property.Name + ".");
}
// Handle complex properties
else {
EstablishDataTableFromType(property.PropertyType, dataTable, prefix + property.Name + ".");
}
}
}
GetValuesFromObject()
: This method will take a source Object
and, for every property, add the value of the property to an object[]
. If an Object
contains an ICollection<>
property, it will recurse over that property, establishing an object[]
for every permutation.
/// <summary>
/// Populates a <paramref name="target"/> list with an array of <see cref="
/// object"/> instances representing the values of each property on a <paramref
/// name="source"/>.
/// </summary>
/// <remarks>
/// If the <paramref name="source"/> contains a nested <see cref="ICollection{T}"/>,
/// then this method will be called recursively, resulting in a new record for
/// every nested <paramref name="source"/> in that <see cref="ICollection{T}"/>.
/// </remarks>
/// <param name="type">
/// The expected <see cref="Type"/> of the <paramref name="source"/> object.
/// </param>
/// <param name="source">
/// The source <see cref="Object"/> from which to pull the property values.
/// </param>
/// <param name="target">
/// A <see cref="List{T}"/> to store the <paramref name="source"/> values in.
/// </param>
/// <param name="columnIndex">
/// The index associated with the property of the <paramref name="source"/>
/// object.
/// </param>
private static void GetValuesFromObject(Type type, Object? source, List<object?[]> target, ref int columnIndex)
{
var properties = type.GetProperties();
// For each property, either write the value or recurse over the object values
for (int i = 0; i < properties.Length; i++)
{
var property = properties[i];
var value = source is null? null : property.GetValue(source, null);
var baseIndex = columnIndex;
// If the property is a simple type, write its value to every instance of
// the target object. If there are multiple objects, the value should be
// written to every permutation
if (IsSimple(property.PropertyType))
{
foreach (var row in target)
{
row[columnIndex] = value;
}
columnIndex++;
}
// If the property is a generic list, recurse over each instance of that
// object. As part of this, establish copies of the objects in the target
// storage to ensure that every a new permutation is created for every
// nested object.
else if (IsList(property.PropertyType))
{
var list = value as ICollection;
var collated = new List<Object?[]>();
// If the list is null or empty, rely on the type definition to insert
// null values into each DataColumn.
if (list is null || list.Count == 0) {
GetValuesFromObject(GetListType(property.PropertyType), null, collated, ref columnIndex);
continue;
}
// Otherwise, for each item in the list, create a new row in the target
// list for its values.
foreach (var item in list)
{
columnIndex = baseIndex;
var values = new List<Object?[]>();
foreach (var baseItem in target)
{
values.Add((object?[])baseItem.Clone());
}
GetValuesFromObject(item.GetType(), item, values, ref columnIndex);
collated.AddRange(values);
}
// Finally, write each permutation of values to the target collection
target.Clear();
target.AddRange(collated);
}
// If the property is a complex type, recurse over it so that each of its
// properties are written to the datatable.
else
{
GetValuesFromObject(property.PropertyType, value, target, ref columnIndex);
}
}
}
CreateDataTableFromAnyCollection
: The original method you provided obviously needs to be updated to call the EstablishDataTableFromType()
and GetValuesFromObject()
methods, thus supporting recursion, instead of simply looping over a flat list of properties. This is easy to do, though it does require a bit of scaffolding given how I've written the GetValuesFromObject()
signature.
/// <summary>
/// Given a <paramref name="list"/> of <typeparamref name="T"/> objects, will
/// return a <see cref="DataTable"/> with a <see cref="DataRow"/> representing
/// each instance of <typeparamref name="T"/>.
/// </summary>
/// <remarks>
/// If <typeparamref name="T"/> contains any nested <see cref="ICollection{T}"/>, the
/// schema will be flattened. As such, each instances of <typeparamref name=
/// "T"/> will have one record for every nested item in each <see cref=
/// "ICollection{T}"/>.
/// </remarks>
/// <typeparam name="T">
/// The <see cref="Type"/> that the source <paramref name="list"/> contains a
/// list of.
/// </typeparam>
/// <param name="list">
/// A list of <typeparamref name="T"/> instances to be added to the <see cref=
/// "DataTable"/>.
/// </param>
/// <returns>
/// A <see cref="DataTable"/> containing (at least) one <see cref="DataRow"/>
/// for each item in <paramref name="list"/>.
/// </returns>
public static DataTable CreateDataTableFromAnyCollection<T>(IEnumerable<T> list)
{
var dataTable = new DataTable();
EstablishDataTableFromType(typeof(T), dataTable, typeof(T).Name + ".");
foreach (T source in list)
{
var values = new List<Object?[]>();
var currentIndex = 0;
// Establish an initial array to store the values of the source object
values.Add(new object[dataTable.Columns.Count]);
// Assuming the source isn't null, retrieve its values and add them to the
// DataTable.
if (source is not null)
{
GetValuesFromObject(source.GetType(), source, values, ref currentIndex);
}
// If the source object contains nested lists, then multiple permutations
// of the source object will be returned.
foreach (var value in values)
{
dataTable.Rows.Add(value);
}
}
return dataTable;
}
IsSimple()
: A helper method to determine if a property type can be reliably serialized to a meaningful string value. If it can't, then the above functions will recurse over it, setting each of its property values to a DataColumn
. This is based on @julealgon
's answer to How do I tell if a type is a "simple" type? i.e. holds a single value.
/// <summary>
/// Determine if a given <see cref="Type"/> can be reliably converted to a single
/// <see cref="String"/> value in the <see cref="DataTable"/>.
/// </summary>
/// <param name="type">
/// The <see cref="Type"/> to determine if it is a simple type.
/// </param>
/// <returns>
/// Returns <c>true</c> if the <paramref name="type"/> can be reliably converted
/// to a meaningful <see cref="String"/> value.
/// </returns>
private static bool IsSimple(Type type) =>
TypeDescriptor.GetConverter(type).CanConvertFrom(typeof(string));
IsList()
: Here, I've added a simple helper method for determining if the Type
of a given property is a generic ICollection<>
or not. It is used by both EstablishDataTableFromType()
as well as GetValuesFromObject()
. This rely's on the Type
type's IsGenericType
and GetGenericTypeDefinition()
. I used ICollection<>
not IEnumerable<>
since e.g. String
implements IEnumerable<>
(you don't want a new column for every character in a string!)
/// <summary>
/// Simple helper function to determine if a given <paramref name="type"/> is a
/// generic <see cref="ICollection{T}"/>.
/// </summary>
/// <param name="type">
/// The <see cref="Type"/> to determine if it is an <see cref="ICollection{T}"/>.
/// </param>
/// <returns>
/// Returns <c>true</c> if the <paramref name="type"/> is a generic <see cref=
/// "ICollection{T}"/>.
/// </returns>
private static bool IsList(Type type) => type
.GetInterfaces()
.Any(i => i.IsGenericType && i.GetGenericTypeDefinition() == typeof(ICollection<>));
GetListType()
: Finally, I've added another simple helper method for determining the generic Type
of a given generic ICollection<>
. It is used by both EstablishDataTableFromType()
as well as GetValuesFromObject()
. This is very similar to the IsList()
method above, except that it returns the specific Type
, instead of just confirming that the property type implements the ICollection<>
interface.
/// <summary>
/// Simple helper function to determine the generic <paramref name="type"/> of
/// an <see cref="ICollection{T}"/>.
/// </summary>
/// <param name="type">
/// The <see cref="Type"/> implementing <see cref="ICollection{T}"/>.
/// </param>
/// <returns>
/// Returns the generic <see cref="Type"/> associated with the <see cref=
/// "ICollection{T}"/> implemented for the <paramref name="type"/>.
/// </returns>
private static Type GetListType(Type type) => type
.GetInterfaces()
.Where(i => i.IsGenericType && typeof(ICollection<>) == i.GetGenericTypeDefinition())
.FirstOrDefault()
.GetGenericArguments()
.Last();
Validation
Here's a very simple test (written for XUnit) to validate the basic functionality. This only confirms that the number of DataRow
instances in the DataTable
match the anticipated number of permutations; it doesn't validate the actual data in each record—though I've separately validated that the data is correct:
[Fact]
public void CreateDataTableFromAnyCollection()
{
// ARRANGE
var customers = new List<Customer>();
// Create an object graph of Customer, Order, and Item instances, three per
// collection
for (var i = 0; i < 3; i++)
{
var customer = new Customer() {
Email = "Customer" + i + "@domain.tld",
Name = "Customer " + i
};
for (var j = 0; j < 3; j++)
{
var order = new Order()
{
ID = i + "." + j
};
for (var k = 0; k < 3; k++)
{
order.Items.Add(
new Item()
{
Description = "Item " + k,
SKU = "0123-" + k,
Price = i + (k * .1)
}
);
}
customer.Orders.Add(order);
}
customers.Add(customer);
}
// ACT
var dataTable = ParentClass.CreateDataTableFromAnyCollection<Customer>(customers);
// ASSERT
Assert.Equal(27, dataTable.Rows.Count);
// CLEANUP VALUES
dataTable.Dispose();
}
Note: This assumes that your
CreateDataTableFromAnyCollection()
method is placed in a class calledParentClass
; obviously, you'll need to adjust that based on your application's structure.
Conclusion
This should give you a good idea of how to dynamically map an object graph into a flattened DataTable
, while also addressing common scenarios you're likely going to encounter, such as properties referencing complex objects (e.g., the ShipTo
example above) and null or empty collections. Obviously, your specific data model may introduce additional challenges unforeseen in my implementation; in that case, this should provide a solid foundation for you to build off of.
Solution 2:[2]
In that case, you will have a row for each Item
in every Order
in the Orders
list of every Customer
object.
If you already know what columns you want to have in the DataTable
, you don't need to generalize that much. The CreateDataTableFromAnyCollection<T>()
method you found, is very general (and generic); creating a DataTable
from any one-dimensional structure generic type. In fact, it's so general it requires using Reflection
!
A recursive call is meant to break down complex functions, this contradicts the point of a generic method. So, such a method would either have to consist of non-generic sub-functions, that would in turn be called recursively, or make use of System.Reflection.MethodInfo.Invoke
to recurse. Read more on this here.
Since you already know what you need, just do:
public static DataTable CreateDataTableFromCustomerList(List<Customer> list) {
DataTable dataTable = new DataTable();
dataTable.Columns.AddRange(new DataColumn[] {
new DataColumn("CustomerName"),
new DataColumn("CustomerEmail"),
new DataColumn("OrderID"),
new DataColumn("OrderItemSKU"),
new DataColumn("OrderItemDescription"),
new DataColumn("OrderItemPrice"),
});
foreach (Customer customer in list) {
object[] values = new object[6];
// { customer.Name, customer.Email };
values[0] = customer.Name;
values[1] = customer.Email;
foreach (Order order in customer.Orders) {
values[2] = order.ID;
foreach (Item item in order.Items) {
values[3] = item.SKU;
values[4] = item.Desc;
values[5] = item.Price;
dataTable.Rows.Add(values);
}
}
}
return dataTable;
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | marc_s |
Solution 2 |