'Having trouble setting up multiple tables in AWS glue from a single bucket

So, I've used Glue before, but it's been with a single file <> single folder relationship.

What I'm trying to do now is to have a structure like this create individual tables for each folder:

 - Data Bucket
     - Table 1 Folder
         - file1.csv
         - file2.csv
     - Table 2 Folder
         - file1.csv
         - file2.csv

...and so on.

But every time I create the crawler and set the Data Bucket as the data source, I only get a single table created. I've tried every combo of the "create single schema ...etc" I can think of.

I'm hoping that I don't have to add each sub-folder as a separate data source as my ultimate goal is to translate it eventually into an RDS instance. Hoping to keep the high-level bucket as the single data source if possible. I can easily tweak folder/file structure if needed.

And yes, I'm aware of partitioning, but isn't that only applicable to individual tables?

Thanks!



Solution 1:[1]

I ran into the same issue and digging into Glue docs, I found that setting table level in crawler's output configurations do the trick.

Table level seems to be set from the bucket level, in your case, I believe setting table level to 2 (the first folder after the root), would do the trick. 2 means that the tables definition starts at that point

Solution 2:[2]

I've been trying to accomplish the same thing. I was hoping that Glue would magically see the different folders and automatically create separate tables. Glue seems to want to create a single table, especially when the schemas overlap. In my example, I'm using US census data so there are some common fields, especially in the beginning of each file.

In the end, I was able to get this to work by creating multiple data stores in the Glue Crawler. By doing this, it would create the five separate tables I wanted, but I had to add each folder manually. Still hoping to find a way to get Glue to discover them automatically.

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Lucas Abreu
Solution 2 Michael Connor