'What are the data catalog option available in open source [closed]
I would like to know which is the best data catalog available which serves the following requirements.
- Open source
- Highly available behind load balancer
- Describes the dataset
- Can describes the data inside the datasets, Ability to have option to describe the fields inside the data
- Data set searchable
I looked at ckan and was really impressed but it did not offer describing fields inside the datasets
Is there any other good tool for the same
Solution 1:[1]
You can actually extend the CKAN and mold it as per your needs. CKAN is a very versatile and flexible product.
e.g. for describing fields inside CKAN you can use https://github.com/ckan/ckanext-scheming
or you can create your own extension by following the documentation. https://docs.ckan.org/en/2.8/extensions/
Solution 2:[2]
Have you looked at Lyft's open source data catalog and discovery tool called "Amundsen"?
https://github.com/lyft/amundsen
https://eng.lyft.com/open-sourcing-amundsen-a-data-discovery-and-metadata-platform-2282bb436234
Solution 3:[3]
I have used the data catalog part of the engrafo-solution (engrafo.eu)
Open source (free plan for 10 users) Highly available behind load balancer (?) Describes the dataset (yes) Can describes the data inside the datasets, Ability to have option to describe the fields inside the data (yes) Data set searchable (yes)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Shubham Mahajan |
Solution 2 | Joseph True |
Solution 3 | James |