GraphQL

SEEK: Establishing a new API integration platform

Photo by Dylan Gillis on Unsplash
893views

Let’s talk about the deep connection of Business and Technology specifically applied to SEEK and the journey we’ve taken to transition to a brand new API integration platform, and how the technology choices we’ve made have really accelerated some of our business outcomes. Before I do that, we need a brief history lesson, SEEK was founded in 1997, (TN: by brothers Paul and Andrew Bassat and Matthew Rockman, essentially as an online version of print employment classifieds) with the most well-known product being SEEK.com.au. And we’ve grown a lot so we’ve needed to undertake a few mammoth technology transformations over the years. The first was because we used to be stuck in the old days of fortnightly release cycles, monolith applications and we have moved to more of a continuous delivery. And the second technology transformation we’re currently embarking on is that SEEK is currently unifying the technological platforms underpinning our server, Asia Pacific job boards (JobsDB and JobStreet). So all are separate entities right now with separate technology stacks and that is a big problem. So don’t think specifically about SEEK.com.au for a moment. In basic user terms, we have SEEK hirers posting jobs, we have SEEK candidates applying for those jobs, using the SEEK website. And SEEK hirers can actually post directly using the seek website, or they can post jobs via third party software. So we have a whole bunch of integrators like JobAdder, BroadBean and idibu that post jobs directly to SEEK through API integrations. So interestingly enough, those API integrations and the job ads that are posted through those systems make up a higher percentage of the postage hubs than our SEEK website. It is therefore a pretty big domain! And my team, put simply, builds and manages everything to do with these API integrations. We have 97 third party recruitment providers that all connect to SEEK, around 130,000 jobs posted, 1.2 million job applications received via our APIs per month and 45% of our SEEK revenues come through our API channel. That represents a lot of traffic and a lot of sensitive data.

Moving on to SEEK API integrations landscape. Traditionally, we had regional APIs servicing those different job boards around Asia Pacific overseas. So within those regions or companies, we also have several different APIs with different authentication methods and inconsistent integration approaches. And these were obviously built by many, many different people and teams over many years and some have degraded over the years. You can probably tell it’s not really manageable and there’s no real safe way to manage improvements on APIs across our APAC brands. Rolling out new features is really painstaking and maintaining the few legacy codes is really quite hard. Naturally, this can’t scale globally so we need to pivot. 

Today I’m going to introduce you to the SEEK API. This is our brand new API integration platform. From a business perspective, it’s been built for two primary reasons. First and foremost, we want to improve our partner integration experience and prove our value and really get into the weeds of the developer experience as well. Two, we want to solve an ever growing problem within SEEK that our indirect API channel can’t really scale and add value as fast as our classic website can. It has always been a bit further behind our company’s global ambitions in terms of adding value for our hirers and partners.

So we’ve moved away from many different APIs servicing different job boards, with different authentication methods to more of a SEEK all-you-can-eat buffet, known as SEEK API. So it’s built on a core technology stack of GraphQL, TypeScript and HAL, all hosted on Amazon web services using ECS fargate. We use auth0 zero for authenticating our partners, hirers and developers. And there’s a swarm of resource APIs, micro services that sit behind our central GraphQL layer that backs the core integration. We can now continue to expand on that feature set, start to enrich it with one sort of integration pattern and authentication method. 

What is GraphQL? GraphQL is a sort of a typed query language and a runtime for most popular programming languages. Within GraphQL, you define a schema of objects or types, as well as queries and mutations, which call essentially many different traditional REST APIs behind the scenes to resolve that request of data. 

Why Graph QL? Why we chose it over a traditional REST API. For us, it has huge benefits first and foremost for our consumers, our partners and as a side effect, it benefits SEEK as well. On the positive side, we’ve got reduced partner integration effort, one API, one integration path pattern, one authentication method, and it requires less support overhead. The schema is consumer centric, designed for our consumers and it’s industry standard. And on our side, because we’re obscuring all of our internal concepts, we can evolve the backend as much as we want without affecting partners. It’s less chatty for the partner, because they’re actually making less API calls, partly because we’ve introduced web hooks as more of a sort of notification mechanism, but also because we’re actually requesting multiple things in one post request. So the great thing about GraphQL is that you can request multiple objects at the same time. And finally, it’s interactive. So GraphQL does a whole heap of things for you and the consumer, it means everything becomes self documented, and saves us a heap of time. 

GraphQL resolves data on mass and builds out a complex object pattern and schema for returning data. It automatically spins up what’s called the playground. This is a web based UI that allows your clients to write queries, fetch data, seek offer, with separate authentication credentials, isolated from production, but living on the same server. It also allows a partner’s developers to play around the schema. So we have a really great website that details our entire graphQL schema, including queries, mutations, inputs, objects, the website automatically builds itself off of our live GraphQL schema. So this lives at developer.seek.com/schema, if you want to check it out. 

So GraphQL is only one part of the SEEK API, I’m going to showcase the four pillars in which the SEEK API is underpinned by these, it encompasses a whole bunch of different tooling that we’ve built. So I’ll talk about self service first. In the old days, we had a number of activities that a partner would need to do in order to get set up to their APIs from both a development and a go live perspective. In the modern world, we’ve really tried to focus on that developer experience and make the whole process self service through documentation. Even if you’re not planning on integrating the SEEK, I really encourage you to look at the developer site, we’ve put a huge amount of effort into this. Every time we make a deployment, we’re actually making sure that the developer sites, code samples are tested against our playground so it maintains that developer site accuracy. The second thing we’ve done is we’ve built what’s called the developer dashboard. So developers can access a portal to manage the workbook subscriptions, view events, replay events, use statistics, set up notifications, manage hirer relationships. We’ll be adding the ability to manage API credentials, and also sort of embed audit logs so you can check when you’ve got validation errors to our API, for debugging, all sorts of things like that. We got a bit bored one day and decided to make an internal SEEK fake third party recruitment platform lovingly called Ryanair. It can post jobs to SEEK, receive applications via SEEK API and use a variety of other features. It’s designed to demo to our partners who are onboarding, it’s also used to test our integrations in the live environment. We love it, it’s fun to build. It’s built using React, GraphQL and a DynamoDB database, integrated with Brain which is a global design system as well. After Ryanair was built, we wanted to open source everything we possibly could so we decided to build a reference implementation called “Wingman” with front end and back end code. The partner developers can use it to integrate the SEEK API out of Ryanair components. We’re still evolving as we go, it’s not perfect, but you can check it out at https://github.com/seek-oss/wingman

Second pillar, Observable. One of the biggest challenges as API developers is knowing who is using what within your API so you can effectively manage breaking changes, write new features, decommission endpoints. It helps us stay close to customers when, as API developers, you can very easily build for yourself and not for others. We have an internal version of the GraphQL schema I mentioned earlier. That allows us to drill into our entire Graph QL schema and look at which of our third party partners use them, whether it is queries, mutations, whatever it is, all the way down the field level. So before we deploy changes to our GraphQL schema, we actually have automated scripts that programmatically check this for us to ensure we don’t push any breaking changes. We built our own proprietary system here. Secondly, we’ve integrated with an SAS page to automatically provide partners with real time info on incidents and other critical messaging, we’ve hooked this up to our internal monitoring platforms and everything’s automated here. We always do have a person that’s in between. So we have automated systems monitoring our stack but before an incident goes up on the website, we always have a dev, just to take the time to say “yes, we want it to go to the website”. On the internal front, we built this amazing tool called back office, allowing our API support team to access information to all the objects within the SEEK API jobs hirers and partners, you can onboard new partners, manage hirer partner relationships the same way the developer dashboard can. Having this kind of visibility is fantastic for developers as well, because we can support the product we’re building. Sifting through all the logs is not the best way to diagnose a production issue sometimes, it could just be down to configuration. 

Third pillar: scalable. So we want to go global and to do that, we need to build an API that’s ready to function for completely different markets, brands, feature sets, multiple job boards, and users from all walks of life. The first thing we tried to do with our objects within the SEEK API was that we wanted to invent a different way of identifying them. So SEEK API is built for a global audience. The ideas that we use to identify these objects need to be globally unique. As mentioned, we have several job boards and different data sources on different scales. To solve this, we invented the Object Identifier patent so we can scale this model to any job board in any country. As mentioned earlier, we also have Webhooks, so our API just has polling mechanisms. We continuously call a REST API for applications and other things. For SEEK API, we use the industry standard webhooks, scalable, liable and these days really secure. We emit several different events out, like when a job goes live, or is expired, or when a candidate application comes through for a job. Webhooks can be set up via the Developer Dashboard mentioned earlier. Migrating your API partners to a whole new API is really hard on every team. When we started by migrating a couple of our existing partners to the SEEK API while we were in the alpha beta phase, what became quite clear is that partners need a lot of hand holding in this respect. With nearly 100 partners to migrate within a year, we needed a separate team to manage rollout. So we spun up a brand new team called the Implementation Squad. The Squad is business focused, its primary job is to contact our existing API partners, form them to the new API, get them to design a migration timeline, kick off with them, and then hold their hand throughout that process, so we rotate tool developers through that team, so it’s purely to give the implementation squad some technical support. Eventually, hopefully, we won’t need to do that anymore. 

Fourth pillar: consistent. The big problem with building APIs is that the developers move on and new developers take the reins, therefore, how do you ensure you can maintain the API long term? And an important part of being consistent is being consistent in the delivery. We offer many small deployments a day rather than one big feature release every few weeks. And this is, of course, really vital to reduce your incidents, because you can track incidents back to small changes, which is much easier to rollback. And it’s great for developers because the delivery of everything you’re building is sped up. Continuous Delivery is fairly common practice these days. And to do this, we use Buildkite as our CI tool to do between 30 to 40 deployments a day with a variety of services, ranging from your small version upgrades on third party dependencies to full blown features. Of course this needs governance, we have strict automated processes that help prevent issues in production. So we have a standard set of unit testing and linting across code bases we’re changing. On top of that, we run integration tests against local snapshots of our databases, and mock API dependencies that we have using Docker. We use GraphQL inspector to detect if any schema changes will affect the partners before we go out the door. We’ve got an automated smoke testing system that runs a whole suite of high level tests against any new deployment. There’s a tester built called the Fuzzer which generates random request data and puts load on the API to see if anything unexpected happens. This is good when you’re testing a fairly risky change and want to get some coverage as it usually is a long running test, so you don’t need to do it every time. And finally, any new changes, as I mentioned, are automatically validated against the examples on the developer side. If you break something on the developer side, obviously, you need to fix the developer side, or change it with the new code. 

What’s the most important thing for a good API? Documentation of course. So on the turtle side, we have a whole suite of internal documentation, ranging from API design guidelines, support documentation for all the services we own, how tos and answers to the column partner questions are really useful as we rotate through the implementation squad. Our internal readme is really descriptive with diagrams and important information so if a new developer joins, they should have really everything they need to get started and to carry on. The final aspect under the consistent pillar is HR open standards. So HR Open Standards provides industry standard guidance for structuring data in the human resources sector. It’s informed by professionals and organisations within the sector. And SEEK GraphQL schema is pretty closely aligned to HR Open Standards and automatic validation shows it stays consistent. We also have a lot of proprietary things within SEEK that are specific to SEEK, but we try to keep a pretty open mind to the industry. 

We’ve set ourselves a pretty ambitious goal, we’re going to decommission our old APIs and have everyone using the SEEK API by the end of June next year even if we only started a couple of months ago. So far, we’re on track.

Q&A Section

Q: If you had your time again, what would you have done differently? With the lessons that you learn? With the benefit of hindsight?

A: It’s not often that you get a chance to reflect on that is it I think you can quite quite easily get in the weeds of delivery and not reflect back. As developper, it’s really all about talking to your API consumers, it’s very, very important to get their input straightaway before you build any API. We have weekly check-in meetings with all of our integrators, just to get up to speed and we always have developers on that call. It’s never usually a business conversation, it’s a mixture of technical and business conversation about the challenges they’re facing from a business standpoint, as well as the integration challenges and things that developers can specifically respond to.

Q: How hard is it to sunset old API suites and push the adoption of a new suite?

A: It is a very involved process. It starts mainly with the implantation squad, they actually reach out and start to determine timelines and things like that. We ended up going through a fairly long process with one of our biggest partners that was onboarding with us recently. It took three or four months of just continuous back and forth of trying to resolve issues and test everything to finally get through that process. We were in beta phase or alpha phase at the time so it took a lot of developer involvement. We were learning things about our API that we’d never seen before and it required good feedback and reflection. 

Q:  The hardest part is getting the external consumers to successfully migrate over to the new versions, because they are impacted. And they have to bear the cost of rebuilding a new API suite. 

A: We also have two different consumers as well, we’ve got high risk SEEK hirers that are posting jobs and we’ve got partners that are integrating the API, so we need to consider both in that scenario, which is very difficult to balance. And quite often, a partner might not want to integrate with a part of your API, but a hirer does because he really wants that value prop that the partner sees no value in their connection to that. 

Q: What are the steps you’ll take to ensure APIs are decommissioned by June? Government steps?

A: So what we’ve tried to do with the SEEK API is build it pretty much entirely independently, it doesn’t actually have any connection to the legacy, bar a few different sort of object references, like IDs and things. So building it really independently means that the moment that we actually have the last part, and we’ll roll off the end of June next year, we can literally just switch off everything. And so decommissioning becomes more about just getting partners off it, rather than actually having the technical debt woven through your application. There are obviously challenges as SEEK hirers have jobs posted using the old legacy and we have to migrate data but we work through it by communicating and extensive documentation. 

Jonathan Cleary

Jonathan Cleary

Principal Developer at SEEK
Jon is the Principal Developer at SEEK's Indirect Job Posting Team and leads one of the development teams that has built our new SEEK API.

APIdays | Events | News | Intelligence

Attend APIdays conferences

The Worlds leading API Conferences:

Singapore, Zurich, Helsinki, Amsterdam, San Francisco, Sydney, Barcelona, London, Paris.

Get the API Landscape

The essential 1,000+ companies

Get the API Landscape
Industry Reports

Download our free reports

The State Of Api Documentation: 2017 Edition
  • State of API Documentation
  • The State of Banking APIs
  • GraphQL: all your queries answered
  • APIE Serverless Architecture