This is what you build to juggle 6,000 tweets a second. That’s the headline that caught my eye and it’s about the challenges of Twitter and the data that they handle. Twitter definitely has a tough problem, one that few of us have, but perhaps they can help us learn to better deal with our own data from their experiences on an edge case.
The story is journalistic, not so technical, but it is interesting. Twitter has struggled with a blend of data that is partially crucial and must be consistent now (usernames) and other data that can be a bit out of date (tweets). They also have lots of unstructured data (photo/video) that is combined with more traditional, structured data. They’ve used a few different database platforms to store this data and assemble it with their application. That’s the same things that most of us also do when we deal with many different types of data.
However Twitter is trying to find away around dealing with disparate systems. They’ve had a number of engineers working on Manhattan, their database designed to handle both structured and unstructured data. And because they work for Twitter, this platform is designed to manage all of this data with very high workload demands at scale.
It will be interesting to see if they come up with any innovative ideas. Certainly SQL Server already has options for managing structured and unstructured data, though perhaps not at the scale Twitter needs.