'What are the data catalog option available in open source [closed]

I would like to know which is the best data catalog available which serves the following requirements.

  • Open source
  • Highly available behind load balancer
  • Describes the dataset
  • Can describes the data inside the datasets, Ability to have option to describe the fields inside the data
  • Data set searchable

I looked at ckan and was really impressed but it did not offer describing fields inside the datasets

Is there any other good tool for the same



Solution 1:[1]

You can actually extend the CKAN and mold it as per your needs. CKAN is a very versatile and flexible product.

e.g. for describing fields inside CKAN you can use https://github.com/ckan/ckanext-scheming

or you can create your own extension by following the documentation. https://docs.ckan.org/en/2.8/extensions/

Solution 2:[2]

Have you looked at Lyft's open source data catalog and discovery tool called "Amundsen"?

https://github.com/lyft/amundsen

https://eng.lyft.com/open-sourcing-amundsen-a-data-discovery-and-metadata-platform-2282bb436234

Solution 3:[3]

I have used the data catalog part of the engrafo-solution (engrafo.eu)

Open source (free plan for 10 users) Highly available behind load balancer (?) Describes the dataset (yes) Can describes the data inside the datasets, Ability to have option to describe the fields inside the data (yes) Data set searchable (yes)

a demo on the data catalog

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Shubham Mahajan
Solution 2 Joseph True
Solution 3 James