Skip to main content

Command Palette

Search for a command to run...

JSON Schema is a constraint system

Or: Why object-oriented programming is a mis-matched mental model for JSON Schema

Updated
6 min read
JSON Schema is a constraint system
H

I am a software engineer, REST API enthusiast, co-author of the JSON Schema, and occasional interactive theatre performer.

Did you come to JSON Schema with an object-oriented programming (OOP) background? Do think of writing a schema as analogous to writing a class? These expectations are both common and understandable, but incorrect.

In this article, you'll learn:

  • What it means for JSON Schema to be a constraint system

  • How to think about designing constraints effectively

  • Why "constraint" encompasses more than just JSON Schema assertions

Constraints vs Definitions

This is an empty schema (in all versions1 of JSON Schema):

{}

It allows everything.

This is an empty class in Java:

public class Empty {
}

It does nothing.

Building Java classes and JSON Schemas for a data model work from opposite starting points and require different approaches:

Empty Java class on left, empty schema on right, structured data model in the middle, with "define" left to center arrow and "exclude" right to center arrow as described in following paragraph

Class definition systems are additive: You start from nothing and the more you specify, the more you can do. On the left, our empty Java class is a black disc with no other colors. You add fields to define the correct set of possible data.

Constraint systems are subtractive: You start with everything and the more you specify, the less you can do. On the right, our empty JSON Schema is a black disc covered by a chaotic riot of color blobs. You add keywords to exclude all incorrect data. This thought process runs in the opposite direction compared to defining a class in Java.

Both processes converge in the middle, where our correct data model is shown as three colors at the points of an equilateral triangle on a black disc. So let's look at how these different processes work.

Be forbidding!

If we were thinking about Java classes and JSON Schemas in parallel, our next step might be adding a property to each (using a single public boolean data attribute for simplicity — colors and shapes just made a better visual!)

public class Empty {
    public boolean isOn;
}
{
    "properties": {
        "isOn": {
            "type": "boolean"
        }
    }
}

While these look analogous, they are very different. Constraints in JSON Schema constrain as little as possible. This maximizes flexibility for schema authors. All this schema says is "if the instance is an object, and if it has a property named isOn, then that property must be a boolean."

In contrast, Java classes are inherently objects with named fields. Java classes only have the fields you declare. They always have those fields, even if you don't use them. The fields are initialized automatically, which for booleans means being set to false.

That's four additional constraints that we need to add to our JSON Schema (using JavaScript comment syntax):

{
   "type": "object",               // forbid non-objects
   "required": ["isOn"],           // forbid objects without "isOn"
   "properties": {
       "isOn": {
           "type": "boolean",      // forbid non-boolean "isOn"
           "default": false        // forbid assuming "isOn" is true
       }     
   },    
   "additionalProperties": false   // forbid props that aren't "isOn"
}

This might seem strange. Why would you want to consider a non-object valid if you are describing object properties?

When you're writing a Java class, you are starting with Java's inherent concept of a class, with all of its automatic behaviors (such as attribute initialization) and assumptions (classes always having exactly the set of attributes that were declared, no more, no less). Java knows that you only want the things you say you want.

On the other hand, JSON Schema has no idea what you're trying to do and can't make any assumptions for you. It doesn't know that you're trying to match an OOP language. Even if it did, you could be writing in Perl, where a reference to anything, even a number or string, can be "blessed" as a class instance and properties can be added or removed at any time.

This is why JSON Schema is a constraint system, and why you want to understand that instead of trying to use as if it were something else.2 It has to work with anything that might be done in JSON, and it relies on you to ask yourself, "have I forbidden everything I don't want?"

Constraints vs validation assertions

JSON Schema has different keyword behaviors including assertions, which can cause validation to fail, and annotations, which can't fail validation but do provide additional information about valid instances.

All of the JSON Schema keywords in the example above are assertions except "default", which is an annotation. But if you're using this schema for code generation, "default" is a constraint in that context. A "default" of true would require a generated Java class to initialize isOn to true. Non-assertions can be constraints in non-validation use cases!

Conclusion

In this article, you've learned that:

  • JSON Schema is a constraint system because it can't know how you will use it

  • Java and similar languages know how you will use them, and make assumptions accordingly

  • The empty JSON Schema allows everything

  • The schema design mindset is: "Have I forbidden everything I don't want?"

  • Whether a keyword functions as a constraint depends on how you are using the schema

Please let me know what you think in the comments, and please click the "Follow" button so you don't miss the next two posts: First, an explanation of dynamic scopes, which you'll want to understand so you get the most out of the following post: When to use "additionalProperties" vs "unevaluatedProperties"!


  1. In real-world usage, you should always specify "$schema", as different implementations handle its absence differently. Some assume a default version (which might not be the one you expected), and others won't process such schemas at all because they don't want to risk giving an incorrect validation outcome with an inaccurate default assumption. To keep examples more readable, I'll often omit "$schema" when it doesn't change how the example is understood.

  2. There have been some misguided attempts to reconcile OOP and constraints. For example, recent versions of the popular AJV implementation of JSON Schema set a bunch of "strict mode" options by default. Many of these attempt to make JSON Schema behave more like a data definition system, but the resulting behavior is not compliant with the JSON Schema specification. In a future article, we'll look at when it is useful to use AJV's strict mode, and when it is problematic. While AJV's strict mode prevents certain valid schemas from being used rather than changing their outcome, this breaks interoperability as many things it prevents have valid use cases. Such schemas written for use with compliant implementations will fail with AJV, which is why the JSON Schema implementations page now notes implementations with noncompliant default behavior.


A

This is a very eloquent articulation of an aspect of JSON Schema I've struggled to describe, making do with "in code we often describe what a thing is; JSON Schema describes what a thing isn't".

May I ask though: why? JSON Schema is--I presume--intended to serve programmatic uses. Why choose a constraint model that's at odds with a structural (or "object oriented") approach?

C

While I understand the point this is making, it seems that many of the popular uses of JSONSchema are for "Definitions" (API Contracts). These typically are then used directly for "Code Generation".

Would you say that this is outside the scope of what JSONSchema was designed for?

H

Yes, and since I'm also very involved in the OpenAPI Specification, I'm very familiar with this mismatch and the problems it causes. We did try to address that in an appendix to the modern JSON Schema drafts , to show how to bridge that gap, but it hasn't really caught on so far. (apologies for the very long delay in replying, see my "Coming Out of Hibernation" post for an explanation)

1
R

Is is worth tying these ideas into the concepts of open-world and closed-world views, and the consequent fairly natural match with RDF, and JSON-LD ?

H

My understanding is that an open world allows for indeterminate truth values. Unless there's an error in the schema, it's always possible to determine whether a given instance is valid with respect to a given schema. It's just that the starting point (the empty schema) allows everything. So that would seem to me to be a closed world approach.

D

Does indeterminacy in an open world express a limitation of modeling a world, or indeterminacy in the model itself? (The map is not the territory, but given a map can truth be indeterminate?)

All sorts of vagueness exists in the real world, "Fred is bald" includes at least two: "which Fred?" and "what is bald?" But does an open world model require that subjects have a unique instance id and that the "bald" attribute is a binary type (or an integer threshold for number of hairs on the head)?

I'd assume that an open world model does not allow vagueness-based indeterminate truth (it follows the law of the excluded middle). Are there different scenarios where indeterminate truth is permitted in the model?

(Attributes can be optional, and if a subject does not have a "bald" attribute then the question cannot be answered for that subject. But that applies to both closed and open worlds.)