Matt’s alone in the studio today – Richard is busy working on updating our courses, so Matt took the opportunity to interview Jon Humble, principle engineer at Sky Betting and Gaming. Find out about the technology stack Jon uses to support systems dealing with huge volumes of data and how they decide whether to use Scala or Java for different jobs.
Matt:Hello, and welcome to podcast number 12 of All Things Java. I'm Matt Greencroft. Apologies for the little pause there. Normally at this point, my college Richard Chesterwood would jump in and say, and I'm Richard Chesterwood but I'm all alone this week. He's not here. Actually, Richard has been busy working updating some of our courses. You may be aware that on quite a few of our courses we show you how you can use Amazon's EC2 Instances to deploy your work onto a cloud based infrastructure. Well, unfortunately for us, Amazon have changed somethings recently, which has meant that we've needed to update some of our videos to make sure that they work with Amazon's newer versions. So, I've locked Richard away in the recording studio for the last few weeks, getting everything right, but I promise I am going to let him out to join in the next podcast. I should perhaps also apologise for the delay in releasing this podcast. We have been super busy over the last couple of months but we'll try to do our best to get a bit more regular from now on.
So I'm actually not quite alone this month. In the studio today we've got Jon Humble with us. Jon works for one of the largest tech employers in our local area called Sky Betting and Gaming. Jon is one of their principle engineers and he's a real expert in Scala, Acca, Erlang and agile programming in general. So thank you very much Jon for joining us in the studio today.
Matt:And we first met you at a talk that you did that was comparing and contrasting I guess Scala and Java so I thought it might be nice to get you in and just talk to you a bit about what you're using in your day to day work and then how you make decisions around what's the right language or tool set for a particular job. Would you mind perhaps just starting by telling us a bit about the kind of applications that work on, the kind of things that Sky Betting and Gaming need?
Jon:Sure. I think that the best way to visualise this is to imagine all of our customers right now playing on Sky Vegas which is virtual slot machines. And every time they play they spin one of the slot machines and we get an event. And the event will tell us what happened to the player, whether they won et cetera, et cetera. On the one hand that seems fairly simplistic but when you've got upwards of 600 or 700 of these a second it gets quite interesting. We use a technology stack using Kafka which is a distributed log if you like. We use Scala to process that. We could perhaps go into why I would prefer Scala to Java in that scenario.
Matt:Okay, so before we get into that bit then, so Kafka for those who haven't come across it, do you describe that as a distributed log?
Jon:Oh it certainly is distributed over machines. I suppose you can think of Kafka as, it's a bit like a broker, if you think in a traditional sense, but imagine a document and every time an event comes in you write it on the next line and keep appending. It's an append only log. And there's typically two views in the Kafka world. The one is you event stream which is every message that you've ever received, and then you may have what's called a sort of compacted view of that which is where you show the latest message for each key. So if you imagine these things are keyed so you might get a key that's says, "Customer one has spun and lost. Customer two has spun and won. Customer two spun and lost." And then when you roll that up the last thing that happened to customer two was he spun and lost.
Matt:Right, okay, lovely. You said then you gave the game away that you would probably be choosing Scala rather than Java to be processing those messages from Kafka. So what's the reason for that?
Jon:Some of it I suppose is a personal preference. You certainly can do it in Java and in fact the Kafka streams library is written in Java and it provides a Java API. So why would you then choose to use Scala to do it? Well a lot of this for me comes down to declarative nature of Scala. If you think about what we're doing here you've got a stream of events and with a stream of events what you're typically going to do is you're going to map, filter and then fold and these sorts of operations. The beauty of a declarative language really is that if you say, "Do a mapping on this collection." What you'll get is another collection with that mapping applied.
If you think about what's really happening there. If you were doing that in C or some sort of imperative language, what you'd have is a loop. And you'd loop around the collection and for each item you would apply a transform. But because Scala is declarative you don't have to bother with the loop, it's implicit. So all you do is you say, "I want this to happen to that collection." And it happens. And that's really powerful. And when you get to chain those things together, all of a sudden you've got a pipeline of transformations, filters, folds, that then lead you very simply to an outcome that would be quite a lot of code in a more imperative type language.
Matt:Okay. So but if you were going to attack that in Java then as opposed to Scala, given that you can do functional-esque coding now in Java since I think it's Java 8, isn't it? Is it much more work to do it in Java than it would be in Scala? Or is it just simply that it looks neater?
Jon:Yeah, I think it's a bit of both. I mean I suppose if you are a functional programmer you would tend to look at what Java have done. It's a little bit tacked on. And it's a little bit heavy in boiler plate. Lots of angle brackets, types and all this. Scala's got type inference so a lot of that stuff goes away. Yes, I mean look, it's got a lot better in Java, don't get me wrong but nevertheless I think if you really want to do it in a functional way use a functional programming language I think.
And I think we talked about this at the talk that you were there at, there's about three ages of programming, if you like. When we started with a C type programme which was very if statements and organising, not organise your data and your methods together, just having it quite haphazard. And that became a problem when programmes started getting really big. And then object orientation came along so you're encapsulating your data with your functions. And that was kind of great but then multiple processers came along and all of a sudden thread locking became a problem so then functional programming kind of started to make more sense because it doesn't have those problems, immutual data structures and what have you. It's a kind of a progression.
nteresting enough that's not a progression in time because functional programming's been around for ages, it just didn't necessarily make sense until very recently.
Matt:Okay, okay, thank you. Can I just get a sense of what are the other kinds of systems or technology stacks that you tend to be involved with on a day to day basis?
Jon:Yeah, absolutely. I mean like I said, we're mostly using Scala with Kafka. On top of that we use Docker as our containerization engine. We use Rancher as our orchestration engine. And then on top of that some of our colleagues they'll use Hadoop. So MapReduce. You can kind of see two sides of this: what we do with Kafka and Scala is like real time processing and what they do in Hadoop, HDFS world, is more number crunching over longer term. And you can actually combine those quite powerfully together if you have the ability to have looked at some data points over a long time and then to be able to provide a precis of this to the real time processing, that can be really powerful.
Matt:Yes. So Hadoop, that's interesting because a lot of our customers, we taught Hadoop for a number of years and a lot of our customers are now saying to us that they're moving from Hadoop to Spark. Again, in terms of where your colleagues are that are using that, are they moving across to Spark typically there? Or is Hadoop a real key?
Jon:I think Hadoop is still really key for us because it's the technology of the data warehouse rather than the data processing. So Hadoop is the way that we are able to maintain a massive data lakes.
Matt:To be HDFS you're saying.
Jon:Absolutely HDFS. Whereas I suppose Spark is more like how you would process stuff in or out of those data lakes.
Matt:Yes, okay, that makes sense, thank you. Great. So that's all interesting to know. What about new stuff looking forward then? If you were starting a new project today, what would your thought process be around deciding what's the languages? What technology stack you want to be using for that.
Jon:Yeah, so I suppose it's I always want to take a horses for courses approach. I may have a preference for Scala but if I was building a website I probably wouldn't do it in Scala because it wouldn't be the appropriate technology. For example, if I was building something that wanted restful interfaces I've always found that really easy to do with something like Jersey in Java. So the jaxRS interface is just really excellent. I've always enjoyed using that. So again, although there are options in the Scala world for doing that, my first thought might be, well maybe we should it in Java like that.
And then I suppose it's to do with throughput processing. One of the things about Scala is it's meant to be the scalable language. That's where its names comes from. The idea is that if you were declarative about how you approach your programming, how that then gets distributed across a whole lot of processes or compute power can then be delegated downstream. You don't need to worry about it. Whereas in something much more low level like C or C++ you're all over that.
I think that's a consideration especially with modern day processors. You've got the ability often to shift the load around and maybe use things like, there are technologies that'll allow you to like, OpenGL - that'll allow you to use your graphics card as compute power. All these things are transparent to the language that you're using it. And being declarative helps with that. It helps with parallelization, it helps with offloading that thought process of where to run it.
Matt:Okay. So one question that always sort of occurs to me is that if I was applying for a job right now and you were interviewing me, if Java is my core language, how much knowledge of one of the other JVM languages would you expect or want to see in a new recruit? What's your ...
Jon:Yeah, I mean I think that actually that angle goes towards explaining one of the weaknesses of Scala if you like, which is that hiring Scala developers is really, really hard and I know because I’m trying to do it. Give me a shout if you're out there. But yeah, I think that it's always really helpful as a programmer to be a polyglot programmer. Just like in life if you can speak multiple languages, multiple natural languages, that's a real benefit and it's the same with programming. I think when you contrast Java with Scala, you're looking at two different worlds in a way. Java's your traditional object oriented programming language. Scala's a bit multi paradigm but I tend to focus very much on the functional side of that. If you are able to do both of those you have a mindset of an imperative and a functional programmer and that's a really powerful thing.
Matt:Okay. So actually, so most programmers today have probably also gone through that progression. Have started, maybe started the object orientated bit and progressed to functional. Are there programmers coming round today who've actually starting on the functional side? Because there's now the recognition that that's what's needed.
Matt:Yeah, okay, lovely.
Well thank you once again for your time Jon. I think I've used enough of it today. And I hope our listeners out there have found this interesting. I think it's always really great to hear from absolute domain experts. I find it really uplifting. So thank you for taking the time out of your busy day to join us.
For those who are asking, I'll just wrap up by saying that we are currently working hard on our next courses here at Virtual Pair Programmers. We'll be making an announcement about what is coming up very soon. So do keep an eye out on our website and our Facebook page if you want to know more. But for now, thank you for listening and see you next time.