'Mysql : How to run heavy analytical query at real time

I am running a crm application which uses mysql database. My application generating lots of data in mysql. Now i want to give my customer a reporting section where admin can view real time report, they should be able to filter at real time. Basically i want my data to be slice and dice at real time fast as possible. I have implemented the reporting using mysql and php. But now as data is too much query takes too much time and page does not load. After few read i came across few term like Nosql, mongoDb , cassandra , OLAP , hadoop etc but i was confuse which to choose. Is there any mechanism which would transfer my data from mysql to nosql on which i can run my reporting query ans serve my customer keeping my mysql database as it is ?



Solution 1:[1]

It doesn't matter what database / datastore technology you use for reporting: you still will have to design it to extract the information you need efficiently.

Improving performance by switching from MySQL to MongoDB or one of the other scalable key/value store systems is like solving a pedestrian traffic jam by building a railroad. It's going to take a lot of work to make it help the situation. I suggest you try getting things to work better in MySQL first.

First of all, you need to take a careful look at which SQL queries in your reporting system are causing trouble. You may be able to optimize their performance by adding indexes or doing other refactoring. That should be your first step. MySQL has a slow query log. Look at it.

Secondly, you may be able to add resources (RAM, faster disks, etc) to MySQL, and you may be able to tune it for higher performance. There's a book called High Performance MySQL that offers a sound methodology for doing this.

Thirdly, many people who need to add a reporting function to their busy application use MySQL replication. That is, they configure one or two slave MySQL servers to accept copies of all data from the master server.

http://dev.mysql.com/doc/refman/5.5/en/replication-howto.html

They then use the slave server or servers to run reporting queries. The slaves are ordinarily a few seconds or minutes behind the master (that is, they're slightly out of date). But it usually is good enough to give users the illusion of real-time reporting.

Notice that if you use MongoDB or some other technology you will also have to replicate your data.

Solution 2:[2]

I will throw this link out there for you to read which actually gives certain use cases: http://www.mongodb.com/use-cases/real-time-analytics but I will speak for a more traditional setup of just MongoDB.

I have used both MySQL and MongoDB for analytical purposes and I find MongoDB better suited, if not needing a little bit of hacking to get it working well.

The great thing about MongoDB when it comes to retreiving analytical data is that it does not require the IO/memory to write out a separate result set each time. This makes reads on a single member of a replica set extremely scalable since you just add your analytical collections to the working set (a.k.a memory) and serve straight from those using batch responses (this is the default implementation of the drivers).

So with MongoDB replication rarely gives an advantage in terms of read/write, and in reality with MySQL I have found it does not either. If it does then you are doing the wrong queries which will not scale anyway; at which point you install memcache onto your database servers and, look, you have stale data being served from memory in a NoSQL fashion anyway...whoop, I guess.

Okay, so we have some basic ideas set out; time to talk about that hack. In order to get the best possible speed out of MongoDB, and since it does not have JOINs, you need to flatten your data so that no result set will even be needed your side.

There are many tactics for this, but the one I will mention here is: http://docs.mongodb.org/ecosystem/use-cases/pre-aggregated-reports/ pre-aggregated reports. This method also works well in SQL techs since it essentially is the in the same breath as logically splitting tables to make queries faster and lighter on a large table.

What you do is you get your analytical data, split it into a demomination such as per day or month (or both) and then you aggregate your data across those ranges in a de-normalised manner, essentially, all one row.

After this you can show reports straight from a collection without any need for a result set making for some very fast querying.

Later on you could add a map reduce step to create better analytics but so far I have not needed to, I have completed full video based anlytics without such need.

This should get you started.

Solution 3:[3]

TiDB may be a good fit https://en.pingcap.com/tidb/, it is MySQL compatible, good at real-time analytics, and could replicate the data from MySQL through binlog.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 O. Jones
Solution 2 Sammaye
Solution 3 shenli3514