Posts written by Allan Bunch

Big Data for Grownups - A Practical Solution

Big Data for Grownups: A Practical Solution

If data is your thing (or even if it’s not!), big data is a phrase you hear each and every day; many times a day; describing everything, and all too often, nothing at all. Wikipedia currently defines big data as: a term for data sets that are so large or complex that traditional data processing applications are inadequate. Interesting. By that definition, it’s safe to say that the reality of big data has been with us for a very long time. I’m sure we’ve all been on one side or another of a traditional data platform as it churned away on a rogue query, while struggling to ingest the latest craze in 3rd-party marketing data. Yeah. We had to fix that.

Marketing hype aside, big data proper is just data — a lot of it. So much of it that turning that data into timely information on a vertically scaled RDBMS eventually becomes virtually impossible. Even well executed replication and partitioning strategies quickly become unmanageable at today’s scale. For these reasons and more, we turn to the architectural approach known as Big Data.

Look at all the shiny new toys!

Navigating the big data technology landscape is quite an adventure. We have free, open source, and commercial solutions for everything imaginable. One big data technology may appear to address the exact same problem as the next. Some of these platform components perform their duties exceptionally well. Others flat out fail. But choose the right parts and you’ll immediately appreciate all the hype. A thoughtfully crafted big data stack is an amazing tool for any business.

That realization is what guided us as we evaluated potential Praxis Stack components. We set out to create a modern big data solution that simply works. The result is a grouping of tools that delivers durable, predictable performance at any scale.

Praxis Big Data Stack

The Praxis Stack relies on the Hadoop Distributed File System (HDFS) and is threaded with YARN and Apache Spark. We found this to be a great balance of performance, functionality, and extensibility. We typically deploy the Praxis Stack to AWS, though any Hadoop-capable environment works brilliantly.

A modern big data solution that just works.

Take a peek beyond the stack’s core and you uncover a wonderfully orchestrated distributed system. Praxis Stack is extremely flexible by design which makes it a fit for tech teams of all shapes and sizes. Check out the players:

* Airpal — a graphical query interface for Presto; from the Airbnb team.
Consul — the best thing going in distributed system service discovery.
Couchbase — the amazingly scalable distributed document-oriented database.
Elasticsearch — so much data; not enough discovery.
Kibana — visualize everything.
Luigi — workflow and job pipeline management tool from the team at Spotify.
Presto — a distributed SQL engine.
RabbitMQ — because where there are messages there should be queues!

You might be wondering about a few familiar names that are absent from the Praxis Stack. Probably the most notable is Kafka. We chose to drop in RabbitMQ 3.6 with lazy queues enabled and we’ve been very pleased with the decision. The balance of high availability and throughput is fantastic. RabbitMQ is also well understood by engineers and developers beyond enterprise data processing teams. This makes RabbitMQ a great choice given Praxis Stack design goals.

Choose the right parts … appreciate all the hype!

A few solid contenders really shined during our evaluation phase. We checked out Airbnb’s Superset (formerly Caravel) which is a general-purpose data visualization tool that looks very promising. It’s only a matter of time before Kibana finds itself sharing the data visualization space with Caravel.

Also from Airbnb is Airflow. It’s a beautiful workflow and job pipeline management tool that, like Luigi, follows a DAG (directed acyclic graph) model for job processing. Both are written in Python, they have similar intents, but they take distinct approaches. We’re definitely keeping an eye on Airflow.

I’m interested in hearing about your data platform experiences. We’re very pleased with the current state of the Praxis Stack, but we love to explore new ideas and suggestions. Our goal is a durable, yet practical big data stack that just works!

Social Media Marketing - Offbeat Moment Showcased In Flipboard #MagsWeLove

5 Flipboard Social Media Marketing Tips

You’ve probably asked yourself: Should Flipboard be part of my brand’s social media marketing strategy? Great question. After all, you tweet your latest blog post headline. You share insider specials with your Facebook fans. You show off your fabulous product photos on your Pinterest boards. All with great success, right? So why Flipboard? Soon after the service’s launch, The Next Web posted a great writeup by Alex Wilhelm, answering this question. In this post, I share a few tips specific to brands. Read More