'What are the options to store GTFS Data for Android App
I am learning android developement in Java. As part of my learning I have started a project working on Transit App using GTFS data.
I have got two concerns:
data File Size:
- My sqlite db file size has grown to 760MB, since I have denormalized GTFS data and created additional tables and indexes for performance gains. GTFS stop_time record has around 4 Million records. Is there a way to reduce the db file size? or is there any other approach besides sqlite db to store data? I did try to store data required by my app using FlatBuffer to a file but that did not seem to reduce the data.
Performance: As mentioned, I tried using FlatBuffer to speed up the data fetch and it seems promising but the file size seems to be the drawback not to use Flatbuffer. Since you would not be able filter data on fields to the extent as we can in sqlite db. I tried using FlatBuffer for a single trip from Station A to Station B.
Any expert advice would be greatly appreciated?
Thanks
Solution 1:[1]
At 4 million rows even the simplest of tables e.g. CREATE TABLE IF NOT EXISTS testit (id INTEGER PRIMARY KEY);
is going to take up 33Mb.
- noting that this is very much optimal as INTEGER's are stored very efficiently and there is no overhead due additional indexes.
Add a single unique column e.g. CREATE TABLE IF NOT EXISTS testit (id INTEGER PRIMARY KEY, name TEXT UNIQUE);
Then based upon storing A + the id value in the name column e.g.
....
The the size jumps to 133Mb.
Without UNIQUE then just 68Mb (i.e. indexes have a cost in space (the UNIQUE constraint utilises an index)).
Test based upon using Navicat and the following SQL :-
DROP TABLE IF EXISTS testit;
CREATE TABLE IF NOT EXISTS testit (id INTEGER PRIMARY KEY/*, name TEXT UNIQUE*/);
WITH cte(counter) AS (SELECT 1 UNION ALL SELECT counter+1 FROM cte LIMIT 4000000)
INSERT OR IGNORE INTO testit SELECT counter/*,'A'||counter*/ FROM cte;
VACUUM;
-- DROP TABLE IF EXISTS testit;
- The last line is uncommented to facilitate a run that cleans up the testing environment.
- Note the use of VACUUM
- the commented out clauses introduced/commented out for the different results.
Considering the above then there is probably not that much scope to reduce the size. Especially considering a quick look at https://developers.google.com/transit/gtfs/examples/gtfs-feed
Saying that there are possibilities that some data can be stored using less space.
Id's appear to be stored as text, utilising integers (long). Would save some space and also be more efficient. That is you use INTEGER PRIMARY KEY
for the columns. That would then make them an alias of the rowid which will be faster than other indexes. see https://www.sqlite.org/rowidtable.html
- obviously references/ relationships would have to be amended accordingly.
- this would not only reduce the size of the table, but also the size of the index (primary key).
- The downside could be accommodating/converting new/changed/downloaded data
date/times
Dates and times appear to be universally textual these could be stored, as integers which would then reduce the size. If stored in a manner consistent with the formats understood by SQLite's Date/Time functions then the data could be extracted in suitable formats. see https://sqlite.org/lang_datefunc.html
Don't you mean normalized? De-normalizing could increase the amount of data due to duplication of data.
See VACUUM (https://www.sqlite.org/lang_vacuum.html) it may be useful.
Consider the logging mode see https://sqlite.org/wal.html. An entity (Table's Trigger's, Views, Indexes) will take up a minimum of a page (default page size is 4k).
Index size will be relative to the size of the table being indexed, so use indexes judiciously.
I'd suggest using an SQLite tool such as DB Browser for SQLite to determine the optimal schema.
Use EXPLAIN/EXPLAIN QUERY PLAN to assist in optimisation of queries, see https://sqlite.org/lang_explain.html.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | MikeT |