What's a "Long-Running" Task?
You’re just starting out choosing which components are going to make up your Mesos cluster, and there are a lot of options. Things like Marathon, Chronos, and Aurora, can run long-running or cron-style tasks. But, you say, what does “long-running” actually mean? My Hadoop jobs can take a long time, does that mean they’re long-running? It’s a weird piece of jargon to get your head around the first time, but everyone seems to use it. So, let’s get a definition.
A long-running task could also be described as a “permanent” task. That is, you start it and then it never stops, and it should always be restarted if it exits. Web apps are a good example of this: you want them to always be available and listening on their assigned port, and if they crash for any reason they should be restarted.
Contrast this to a Hadoop job: it has a definite end time (when the job’s processing work is done), and if it exits it should not unconditionally restart. Of course, it may need to on a job failure, but you don’t want your Hadoop jobs running over and over in a neverending Elephant stampede, especially if they’re failing.
How about workloads that need to be run over and over? For example, a daily ETL process? Are those long-running? Nope! Those should be restarted on a schedule, not on exits.
And what about frameworks themselves? Are those long-running? Let’s apply the definition: are they always on and should they always be restarted? In most cases, yes! Frameworks like Marathon and Chronos are long-running, even though they can schedule short-term tasks. There are a few exceptions, like running individual Spark jobs on Mesos, but most of the frameworks you will run will be long-running.
Here’s a graphic to drive this home:
With this definition in hand, you can now choose an appropriate framework for your workload. If you know you have a long-running task, give Marathon a shot. It can schedule work like web services and cache processes, as well as other frameworks. If you have something you want to run periodically, try Chronos or Aurora, both of which have cron-like scheduling options. And if you have something that you just want to run on-demand, the instructions for your framework will probably have recommendations on how to make sure the jobs complete successfully and run completely.