JSON in Traditional Databases

JSON in Traditional Databases

Using JSON in traditional databases is possible! Check out on how to do it properly in MySQL and Postgres

In early 2010s, one of the most popular emerging database was Mongo. It was popular as it was a Document or JSON based database with flexible schema. Schema-less was enticing in those days of moving fast and breaking things. Even though Mongo has crossed the chasm of hype, it is still one of the most popular databases. It also tried to bring back SQL and ACID as these were critical for a well-positioned databases. In the RDBMS world, the rise of Mongo gave an opportunity to embed JSON in the traditional RDBMS. All traditional databases such as MySQL, Postgres, Oracle and SQL Server started providing JSON support. Of course, JSON as a column type is still not a great choice due to performance penalties and loosing a well-defined domain, it is still a good use-case for some scenarios.

When JSON can be a good fit

There are some scenarios when JSON type columns can be considered to be introduced into a traditional DB. These are:

Tenant-specific data

Say you have a white label product where most of things can be put in a traditional table format. But there are some fields in different tenants which are not overlapping. For example, for an application like Shopify, some merchants would want to additional product metadata which can lead to too many columns. In such a case, you could use a JSON column for storing metadata and not having a lot of NULL-based columns for the rest of the tenants.

Telemetry data

You should use a Time-series Database for such a use-case; but if you want to leverage databases like Postgres for storing Metric or telemetry data; you coud use a JSON column

Feature flags

If you are doing a lot of A/B testing and you want to store the feature flags, you could go with a JSON column. Similar to tenant-specific data; it helps in reducing a lot of BOOLEAN columns. You can have a single column like this

{
  "SHOW_PRICES":true,
  "SHOW_DYNAMIC_PRICING":true,
  "SHOW_MFG_DATE":false,
  "SHOW_STOCK":true,
}

But keep in mind when not to use JSON. Just because you can, does not mean you should.

JSON type Support

Postgres

Postgres has JSON and JSONB. Both the types accept JSON.

  1. JSON Introduced in Postgres 9.2 in 2012. It is a string with additional tools to do some JSON validation.

  2. JSONB Introduced in Postgres 9.4 in 2014. JSONB is stored in binary format with indexing capabilities to optimize reads.The other advatage of JSONB over JSOn is its efficient JSON parsing.

JSON vs JSONB table

WhatJSONJSONB
Preserving WhitespaceJSON preserves whitespacesJSONB does not preserve whitespace as it is binary format
Ordering of keysJSON preserves key orderingJSONB does not preserve key ordering
Duplicate keysJSON preserves duplicate keys. But this is invalid JSON so issues might arise when reading and validating JSONJSONB removes all duplicate keys and keeps the last one.
Ingestion performanceIngestion is faster in JSONIngestion is slower in JSONB
JSON ProvessingSlow JSON procession as it is a text fieldJSON processing is faster as the binary format has efficient storage and reading capabilities for JSON keys.

Caveats of Postgres JSON types

  1. Malformed JSON:Both give error if you do not add quote (') for key and value.

  2. Duplicate Key: No error!

    • JSON: Stores as it is

    • JSONB: Stores only last key, which might lead to confusion and loss of information when JSON validation is not done at application layer and it is stored in database. MySQL throws an error during insert.

SQL Script

INSERT into json_postgres (json_col_json_type) values ('{"foo":"bar","foo":"bar"}');
INSERT into json_postgres (json_col_jsonb_type) values ('{"foo":"bar","foo":"bar"}');

OUTPUT:

id|json_col_json_type       |json_col_jsonb_type|
--+-------------------------+-------------------+
 1|{"foo":"bar","foo":"bar"}|                   |
 2|                         |{"foo": "bar"}     |

MySQL

JSON has been supported from v5.7.8.

Features include:

  1. Validating JSON documents during insert.

If you have a invalid JSON for a JSON type, MySQL will throw an error.

Example for this query: INSERT into json_mysql (json_col_json_type) values ('{"foo":bar}') Error comes as.

SQL Error [3140] [22001]: Data truncation: Invalid JSON text: "Invalid value." at position 7 in value for column 'json_mysql.json_col_json_type'.
  1. Storage in Binary format for fast access

JSON type is in binary format which helps in faster access of JSON keys and members. If you stores the JSON in text format and you did a SELECT statement; it would do the following things

SELECT "foo" key-> JSON Parser -> Scan JSON for "foo" sequentially

On the other hand, JSON has a lookup table. Lookup table contains pointers all the the keys in the JSON document which can be queries efficiently with Binary Search.

DatabaseJSON
Postgres2 data types; JSON and JSONB
MySQL2 data types; JSON and TEXT
Oracle3 data types. OSON Oracle's binary format introduced in 21c ; Can also store in BLOB, CLOB as well
Microsoft SQL serverNo specific JSON type. Store in NVARCHAR

SQL example

MySQL

  1. Create table
create table json_mysql (

    id bigint auto_increment primary key, 
     -- JSON is stored in binary format with metadata. Has storage overhead, but no runtime invalid JSON issues
    `json_col_json_type` JSON,
    `json_col_text_type` text  

);
  1. Insert data
INSERT into json_mysql (json_col_json_type) values ('{"foo":"bar"}');
INSERT into json_mysql (json_col_text_type) values ('{"foo":"bar"}');
select `json_col_json_type`->>'$.foo' from json_mysql;
select `json_col_text_type`->>'$.foo' from json_mysql;
  1. Insert invalid JSON
INSERT into json_mysql (json_col_json_type) values ('{"foo":bar}') -- Throws error
select `json_col_json_type`->>'$.foo' from json_mysql;
INSERT into json_mysql (json_col_text_type) values ('{"foo":bar}') -- No error
select `json_col_text_type`->>'$.foo' from json_mysql;

JSON Methods

There are 20 methods

  1. JSON_OBJECT
SELECT JSON_OBJECT('foo', 'bar'); 

-- Output
|JSON_OBJECT('foo', 'bar')|
|-------------------------+
|{"foo": "bar"}           |
  1. JSON_ARRAY and JSON_OBJECT

SELECT JSON_OBJECT('employeeIds', JSON_ARRAY('e1', 'e2', 'e3'));
-- Output
JSON_OBJECT('employeeIds', JSON_ARRAY('e1', 'e2', 'e3'))|
--------------------------------------------------------+
{"employeeIds": ["e1", "e2", "e3"]}                     |
  1. JSON_CONTAINS_PATH
SELECT `json_col_json_type`, JSON_CONTAINS_PATH(`json_col_json_type`, 'one', '$.foo') as count_foo from json_mysql;

-- Output
json_col_json_type|count_foo|
------------------+---------+
{"foo": "bar"}    |        1|
  1. Check size
SELECT 
max(JSON_STORAGE_SIZE(`json_col_text_type`)) as max_cell_json_size,
max(JSON_STORAGE_SIZE(`json_col_json_type`)) as max_cell_type_size,
max(char_length(sample_varchar_1)) as max_cell_varchar2_size
FROM json_mysql;

Postgres

  1. Create table
create table json_postgres (
    id serial primary key, 
    json_col_json_type JSON, -- JSON is stored in binary format with metadata. This has storage overhead, but not runtime invalid JSON issues like Text 
    json_col_text_type text,
    json_col_jsonb_type JSONB
);
  1. Insert data
INSERT into json_postgres (json_col_json_type) values ('{"foo":"bar","foo":"bar"}');   -- no error. stores as it is
INSERT into json_postgres (json_col_jsonb_type) values ('{"foo":"bar1","foo":"bar2"}');-- no error. but stores only last key

JSON Methods

Postgres JSON functions can be found here: Postgres JSON Functions

Index on JSON

For faster access of JSON data types, databases can be indexed for faster access.

MySQL

  1. Step 1: Check the POSSIBLE_KEYS attribute which decides the Index key
EXPLAIN SELECT * FROM json_mysql;

Output:
id|select_type|table     |type|possible_keys|key|key_len|ref|rows|
--+-----------+----------+----+-------------+---+-------+---+----+
 1|SIMPLE     |json_mysql|ALL |             |   |       |   |   3|
  1. Step2: JSON supports indexing iff it is a generated column.
ALTER TABLE json_mysql ADD COLUMN foo VARCHAR(5)

GENERATED ALWAYS as (json_col_json_type->>'$.foo');
  1. Step3: Add index. Note that if you have an array like {"foo":["bar"]};, then this will lead to B-Tree issue as JSON value will not be valid for the ["bar"] in B-Tree index. Therefore, keep in mind that your JSON key maps to an normal object and not an array.
-- Check for NULL values to avoid incorrect B-Tree Indexes
SELECT JSON_UNQUOTE(JSON_EXTRACT(json_col_json_type, '$.foo')) FROM json_mysql;

-- Create index using BTree
ALTER TABLE json_mysql ADD INDEX foo (foo) USING BTREE;
  1. Step4: See POSSIBLE_KEYS for indexing used by MySQL
EXPLAIN SELECT * FROM json_mysql jm WHERE foo = 'bar';
Output:
id|select_type|table     |type|possible_keys|key|key_len|ref  |rows|
--+-----------+----------+----+-------------+---+-------+-----+----+
 1|SIMPLE     |json_mysql|ref |foo          |foo|23     |const|   1|

Postgres

As discussed before, JSONB type in Postgres allows for indexing of JSON data. There are 3 types of indexes:

  1. B-Tree Most common form of index data structure. It creates a tree that will he balanced when new nodes are added. While reading, the B-Tree will be. traversed to find the key and return the node with the data you are querying for.

  2. GIN (Generalized Inverted indexes) B-Tree are great for finding normal operators with string values. But Postgres databases can store mutlitude type of data types such as JSONB, Arrays, Range types, Full-text search etc.

  3. Hash-indexes Useful for equality operators for exact matches. It is faster than B-Tree for such operations with less memory footprint when compared to B-Tree and GIN indexes.

Memory consideration

Postgres

The page size of storing 1 row in Postgres is (2KB, 8KB or other size as configured). As JSON can lead to bigger page sizes, it can lead to overflow of the limit of 2KB. Postgres has a technique of storing overflow/oversize rows using TOAST(The Oversize Attribute Storage Technique) If data goes beyond 2KB, it will store the data to an out-of-line storage.

Postgres solves issue of row-oversizes via various techniques:

  1. Compression

  2. TOAST table for out-of-line storage

  3. Combine Compression and Toast table

There are 4 strategies of TOAST storage:

  1. Plain

  2. Extended

  3. External

  4. Main

Storage typeCompressionOut-of-line storageUsageOrder of compression + out-of-line storage
PlainNoNoShort rows that do not go beyond TOAST LimitNot applicable
ExtendedYesYesJSONB and other data typesCompression done first; if row is still big, do OOL storage
MainYesNoIf TOAST data is to be stored in same tablespace as the normal non-verflow dataOnly compression
ExternalNoYesIf you want faster lookups than MAIN. But this comes at storage size penalty.Only OOL

MySQL

Since MySQL has decoupled storage engines such as InnoDB, MyISAM etc; hence handling would be different for in other separate engines. Considering the most popular and default storage engine which is InnoDB; there are some configuration to handle large JSON objects.

Similar to Postgres, if the row size goes beyond the default configured 8K page size, the JSON document can be moved to an external overflow space with a pointer to current row.

There are 4 row format for handling overflows:

  1. Redundant Legacy row format available before MySQL 5.0.3 only. Can store maximum indexes of 768 bytes for any type of large column such as VARCHAR, BLOB, TEXT. Overflow is put into new overflow page with a 20 byte pointer from main page -> overflow page. If overflow page overflows (inception!), then a new pointer is created from overflow page 1 -> overflow page 2 -> ...

  2. Compact Similar to Redundant, but has storage savings of upto 20% due to compaction.

  3. Dynamic Dynamic storage format tries to keep all data in one page. If data is too big, whole data is stored in seperate page instead of breaking up the data and using pointers to map from map to overflow page. Index Key prefix length is 3072 bytes instead of 768 bytes.

  4. Compressed Similar index key prefix length of 3072 bytes. It does page-level compression which is not done in previous row formats

Conclusion

JSON is supported in Relational Databases, but the implementation and nuance on its usage is needed to be understood in order to makes a decision on how and if to use JSONs in databases such as Oracle, MySQL and Postgres!

Did you find this article valuable?

Support Ayman Patel by becoming a sponsor. Any amount is appreciated!