When Zip’s director of security and IT Peter Robinson joined the company two-and-a-half years ago, he was its only security staffer.
The company – ASX-listed and a major player in the ‘buy now, pay later’ (BNPL) boom – has scaled up rapidly in that time, growing from 100 people overall to over 1000.
The story of that scale-up, and the technology challenges it brought, is told in this week’s CXO Challenge interview on the iTnews podcast.
The security team remains relatively lean.
“We’re a pretty small team at Zip,” Robinson said. “We recently got some additional folks, but my first year and a half at zip, I was there on my own, and then I got two more people.
“So up until about midway through last year, we were three people running this global entity and multi-cloud account infrastructure.”
During those two-and-a-half years, Robinson also expanded his remit to cover IT as well, an embodiment of an internal motto at Zip to take responsibility for challenges as they appear.
“We have a motto [and] we wear it on our sleeves, hashtag ownit,” Robinson said, pointing the sleeve of his Zip hoodie at the laptop camera.
“That is kind of a company slogan. It’s kind of a marketing slogan, but it’s also an internal staff slogan.
“We do it like that. If you see a problem, just own it until it’s fixed.
“That’s why I run IT as well as because no one else was doing that. So you kind of go, ‘Well, I can do that, too’.”
Unpacking Zip’s infrastructure
In his core cyber security domain, ASX-listed Zip is a challenging environment to secure.
For starters, there’s the company’s rapid growth to contend with. Riding the buy now, pay later (BNPL) wave, Zip has grown out of its native Australia into the UK, US, Europe, the Philippines, Japan, New Zealand, and South Africa, “all within the last two years” and largely by acquisition.
“It’s been a very, very rapid expansion – hyper growth – and I think if anyone was to take a look at it, they’ll see that cyber security is very, very challenging under those conditions,” Robinson said.
“I think one of the biggest challenges through that acquisition process is that as the company acquires new businesses, of which there’ve been seven or eight in recent times, just bringing them on board and getting our security technology and our capabilities across those – they’re disparate and they have different processes, technologies and CI/CD pipelines and everything else.”
Aside from a handful of physical firewalls and wifi access points, Zip is also cloud-only. Its Australian operations run out of AWS, while its international operations are on Azure.
The company runs multi-cloud but also multi-account; it had “six or seven” accounts two-and-a-half years ago, and 43 today. Different accounts are used for stages of the development lifecycle or are otherwise arranged by purpose.
Zip also runs an “ephemeral infrastructure” operating model – backed by serverless compute and infrastructure-as-code – spinning resources up and down on-demand.
“We have very ephemeral environments with ephemeral assets, so if we’re in full flight here in Australia, in the middle of the day when our backend systems are running at their peak, we can spin up anywhere between 1500 and 2000 server systems to do approvals and things like that, and then by five o’clock in the afternoon, those are all gone again,” Robinson said.
Robinson provides the example of unsecured business loans. Zip has set up its infrastructure in such a way that compute is prioritised not to hold up applications in any way.
“We have a whole backend automated system that allows us to validate and verify whether or not we should be doing loans for people,” he said.
“The decisioning engine is based on quite a substantial amount of rules engine-type systems and a bit of machine learning and external API connectivity to credit bureaus and to social systems and things like that.
“To keep the latency down for people applying – you don’t want to make people wait while they’re trying to apply for a loan – we’ll actually spin things up in real-time, so as an application comes through we’ll spin up a decisioning engine just for that particular thing, or we’ll pre-empt it based on predicted load for the day, so that we keep our latency down and allow our systems to operate when demand is high.
“We literally have systems coming and going throughout the day.”
Vulnerability scanning project
The operating model made vulnerability management challenging.
Until last year, Zip used the open source vulnerability assessment scanner OpenVAS to scan for vulnerabilities, misconfigurations and other issues.
“It’s freely available on the internet and then we had some API integration with our Amazon infrastructure to pull sets of IP addresses of currently running assets out of there and feed it, so we had to script a lot of stuff, and actually manually, in some cases, feed the scanners so that they actually knew what to look for,” Robinson said.
However, the tool provided coverage for between 30 percent and 40 percent of Zip’s infrastructure at any one time, and Robinson had been looking for some time to drive that up to 100 percent.
“I think with the cloud comes a particular challenge, particularly when you have that many ephemeral assets coming and going,” he said.
“Traditional vulnerability scanning devices require you to buy endpoint licensing volumes, if you know what I mean, so for every server, you’ve got to buy a license. And for me, that was just a crazy thing.
“Secondly, they want to deploy endpoint agents onto those devices, which again, is a crazy thing given that I’ve got these assets scaling up and down.
“Another mechanism that’s used often to scan for vulnerabilities is network-based scanning, which requires credentials on the endpoint and network access. Again, trying to keep up with the fast-moving environment and the infrastructure that’s continuously changing to try and keep those things up to date in there was nuts.”
Robinson said he was then introduced to Orca Security, which promises a way to “detect vulnerabilities, malware, misconfgurations, lateral movement risk, authentication risk, and insecure high-risk data, [and] then prioritise risk based on the underlying issue, its accessibility, and blast radius – without deploying agents.”
The tool works with all three major public cloud providers; Robinson said Zip ran a trial, found it worked, and put it into production in 2020.
Full coverage of Zip’s infrastructure, as well as the prioritisation of problems to fix, are key benefits, Robinson said.
“We can have a couple of 100,000 vulnerabilities on systems, some of them being informational or low-level or whatever but it’ll say, ‘here are your 72 that you should be caring about today because of these reasons’, and that’s where our human process then starts, whereas before our human process would start much further up,” he said.
“I think the biggest problem that it solves is it’s added a whole bunch of additional resources to my capability, without me having to get analysts and people to trawl through disparate data systems and trying to figure out how to prioritise today’s work. It’s saved us a lot of time and effort.”
Towards self-healing infrastructure
While the company already makes use of infrastructure-as-code, it is hoping to introduce even more automation to its IT environment to ultimately make it self-healing.
“This year, I’m hoping for some increased automation of our capabilities, where we actually mature to the point where the systems and infrastructure is stable enough that our problem-finding tools can drive our problem-fixing tools,” Robinson said.
“At the moment, you’ve got problem-finding tools, and then you’ve got problem-fixing tools, and there’s a big ‘people process’ inbetween there to make decisions about things.
“There’s a learning process that goes along with that as well before you can trust things to automatically self-heal and repair and stuff like that. but that’s the next step. That’s the plan.”