JSON Schema vs. OpenAPI

Munish Goyal
codeburst
Published in
8 min readAug 13, 2020

--

JSON Schema and OpenAPI can seem similar but have different use-cases.

To begin, how JSON Scheme and OPenAPI differ? Well, in contrast to JSON Schema, an OpenAPI document is a definition for an entire API, not just data models. One might compare JSON Schema with the OpenAPI data model.

Why the need to validate JSON?

There are a plethora of use-cases, but let me explain why I use it:

Enter the world of Kubernetes and you’ll find yourself surrounded by object manifests which are either defined as YAML or JSON. But having to maintain thousands of such manifests can be a nightmare if your code is repeated. Languages likejsonnet, a lazy data templating language by Google, let you DRYup (Don’t Repeat Yourself) the configuration code. The jsonnet spits JSON, offers template reuse, processes a code only if it is required. So not only does it give you a performance boost, it also takes away a big percentage of maintenance.

Good, now we have a lot of code written in jsonnet which generates JSON based manifests. Going forward, as custom JSON objects grow, we need some way to validate inputs and auto-document. This is when you would use OpenAPI.

JSON Schema

JSON schema, as defined at json-schema.org, is a powerful tool for validating the structure of JSON data.

At its heart, JSON is built on the following data structures:

  • object: for example { "key1": "value1", "key2": "value2" }
  • array: for example [ "first", "second", "third" ]
  • numbers: for example 42, 3.1415926
  • string: for example "This is a string"
  • boolean: for example true and false
  • null: for example null

These types have analogs in most programming languages, though they may go by different names.

In JSON Schema, an empty object, {}, is a completely valid schema that will accept any valid JSON (any object, number, string, etc). You can also use true in place of empty object to represent a schema that matches anything, or false for a schema that matches nothing.

The most common thing to do in a JSON schema is to restrict to a specific type. The type keyword is used for that. For example,

{ "type": "string" }

The type keyword may either be a string or an array (in which case the JSON snippet is valid if it matches any of the given types).

{ "type": ["number", "string"] }

Since JSON Schema is itself JSON, it’s not always easy to tell when something is JSON Schema or just an arbitrary chunk of JSON. The $schema keyword is used to declare that something is a JSON Schema. It’s generally a good practice to include it, though it is not required.

{ "$schema": "http://json-schema.org/schema#" }

It is also best practice to include an $id property as a unique identifier for each schema. You can just set it to a URL at a domain you control, for example,

{ "$id": "http://yourdomain.com/schemas/myschema.json" }

Keywords specific to object data type:

  • The properties (key-value pairs) on an object are defined using the properties keyword. The value of properties is an object, where each key is the name of a property and each value is a JSON schema used to validate that property.
  • The additionalProperties keyword is used to control the handling of extra stuff, that is, properties whose names are not listed in the properties keyword. By default, any additional properties are allowed. The additionalProperties keyword may be either a boolean or an object. If additionalProperties is a boolean and set to false, no additional properties will be allowed. If additionalProperties is an object, that object is a schema that will be used to validate any additional properties not listed in properties.
  • By default, the properties defined by the properties keyword are not required. However, one can provide a list of required properties using the required keyword. The required keyword takes an array of zero or more strings. Each of these strings must be unique.
  • The names of properties can be validated against a schema using propertyNames, irrespective of their values. This can be useful if you don’t want to enforce specific properties, but you want to make sure that the names of those properties follow a specific convention.
  • The number of properties on an object can be restricted using the minProperties and maxProperties keywords. Each of these must be a non-negative integer.
  • The dependencies keyword allows the schema of the object to change based on the presence of certain special properties. There are two forms of dependencies in JSON Schema:
  • Property dependencies declare that certain other properties must be present if a given property is present. The value of dependencies keyword is an object. Each entry in the object maps from the name of a property, p, to an array of strings listing properties that are required whenever p is present.
  • Schema dependencies declare that the schema changes when a given property is present. Schema dependencies work like property dependencies, but instead of just specifying other required properties, they can extend the schema to have other constraints.
  • As we saw before, additionalProperties can restrict the object so that it either has additional properties that weren’t explicitly listed, or it can specify a schema for any additional properties on the object. Sometimes this isn’t enough, and you may want to restrict the names of extra properties, or you may want to say that, given a particular kind of name, the value should match a particular schema. That’s where patternProperties comes in: it maps from regular expressions to schemas. If an additional property matches a given regular expression, it must also validate against the corresponding schema.
  • The patternProperties can be used in conjunction with additionalProperties. In that case, additionalProperties will refer to any properties that are not explicitly listed in properties and don’t match any of the patternProperties.

Keywords specific to string, number, boolean and null data types:

  • The length of a string can be constrained using the minLength and maxLength.
  • The pattern keyword is used to restrict a string to a particular regular expression.
  • The format keyword allows for basic semantic validation on certain kinds of string values that are commonly used. Check Built-in-formats.
  • Range of numbers are specified using a combination of minimum and maximum keywords (or exclusiveMinimum and exclusiveMaximum for expressing exclusive range).
  • The boolean keyword matches only two special values: true and false. Note that the values that evaluate to true or false, such as 1 and 0, are not accepted by the schema.
  • The null type is generally used to represent a missing value. When a schema specifies a type of null, it has only one acceptable value: null.

Keywords specific to array data type:

The items keyword:

  • Set items keyword to a single schema that will be used to validate all of the items in the array.
  • Set items keyword to an array, where each item is a schema that corresponds to each index of the document’s array. That is, an array where the first element validates the first element of the input array, the second element validates the second element of the input array, etc.
  • While items schema must be valid for every item in the array, the contains schema only needs to validate against one or more items in the array.
  • The additionalItems keyword controls whether it’s valid to have additional items in the array beyond what is defined in items. Setting it to false has the effect of disallowing extra items in the array. It can also be schema to validate against every additional item in the array.
  • The length of the array can be specified using the minItems and maxItems keywords. The value of each keyword must be a non-negative number.
  • A schema can ensure that each of the items in an array is unique. Simply set the uniqueItems keyword to true.

Generic Keywords:

  • The title and description keywords must be strings. A title will preferably be short, whereas a description will provide a more lengthy explanation about the purpose of the data described by the schema.
  • The default keyword specifies a default value for an item. JSON processing tools may use this information to provide a default value for a missing key/value pair, though many JSON schema validators simply ignore the default keyword. It should validate against the schema in which it resides, but that isn’t required.
  • The enum keyword is used to restrict a value to a fixed set of values. It must be an array with at least one element, where each element is unique. You can use enum even without a type, to accept values of different types.
  • The const keyword is used to restrict a value to a single value.

Reusing Schemas using $ref:

  • We can refer to a schema snippet from elsewhere using the $ref keyword. The easiest way to describe $ref is that it gets logically replaced with the things that it points to.
  • You will always use $ref as the only key in an object: any other keys you put will be ignored by the validator.
  • The value of $ref is a URI-reference, and the part after # sign (the “fragment” or “named anchor”) is in a format called Json Pointer.
  • If you’re using a definition from the same document, the $ref value begins with the pound symbol, #. Following that, the slash-separated items traverse the keys in the objects in the document.
  • The $ref elements may be used to create recursive schemas that refer to themselves.
  • The $id property is a URI-reference that serves two purposes:
  • It declares a unique identifier for the schema.
  • It declares a base URI against which $ref URI-reference are resolved.
  • It is best practice that every top-level schema should set $id to an absolute-URI (not a relative reference), with a domain that you control.

Combining Schemas:

  • To validate against allOf, the given data must be valid against all of the given sub-schemas (provided as elements of an array).
  • To validate against anyOf, the given data must be valid against any (one or more) of the given sub-schemas.
  • To validate against oneOf, the given data must be valid against exactly one of the given subschemas.
  • The not keyword declares that an instance validates if it doesn’t validate against the given sub-subschema.

Applying sub-schemas conditionally:

  • The if, then, and else keywords allow the application of a sub-schema based on the outcome of another schema. If if is valid, then must also be valid (and else is ignored.) If if is invalid, else must also be valid (and then is ignored).

OpenAPI

OpenAPI Specification (formerly Swagger Specification) is an API description for REST APIs. An OpenAPI file allows you to describe your entire API, including:

  • Available endpoints (/users) and operations on each endpoint (GET /users, POST /users)
  • Operation parameters Input and Output for each operation
  • Authentication methods
  • Contact information, license, terms of use and other information

The complete OpenAPI Specification can be found on Github OpenAPI-Specification.

Conclusion

This article has explored which tool to use and when. If there are only just data models whose schema you need to define, JSON Schema is a good option. But, if you want to describe your entire API, it’s better to go with OpenAPI. I hope you have found this article helpful, thank you for reading!

--

--

Designing and building large-scale data-intensive cloud-based applications/APIs.