JSON in Traditional Databases
Using JSON in traditional databases is possible! Check out on how to do it properly in MySQL and Postgres
In early 2010s, one of the most popular emerging database was Mongo. It was popular as it was a Document or JSON based database with flexible schema. Schema-less was enticing in those days of moving fast and breaking things. Even though Mongo has crossed the chasm of hype, it is still one of the most popular databases. It also tried to bring back SQL and ACID as these were critical for a well-positioned databases. In the RDBMS world, the rise of Mongo gave an opportunity to embed JSON in the traditional RDBMS. All traditional databases such as MySQL, Postgres, Oracle and SQL Server started providing JSON support. Of course, JSON as a column type is still not a great choice due to performance penalties and loosing a well-defined domain, it is still a good use-case for some scenarios.
When JSON can be a good fit
There are some scenarios when JSON type columns can be considered to be introduced into a traditional DB. These are:
Tenant-specific data
Say you have a white label product where most of things can be put in a traditional table format. But there are some fields in different tenants which are not overlapping. For example, for an application like Shopify, some merchants would want to additional product metadata which can lead to too many columns. In such a case, you could use a JSON column for storing metadata and not having a lot of NULL
-based columns for the rest of the tenants.
Telemetry data
You should use a Time-series Database for such a use-case; but if you want to leverage databases like Postgres for storing Metric or telemetry data; you coud use a JSON column
Feature flags
If you are doing a lot of A/B testing and you want to store the feature flags, you could go with a JSON column. Similar to tenant-specific data; it helps in reducing a lot of BOOLEAN
columns. You can have a single column like this
{
"SHOW_PRICES":true,
"SHOW_DYNAMIC_PRICING":true,
"SHOW_MFG_DATE":false,
"SHOW_STOCK":true,
}
But keep in mind when not to use JSON. Just because you can, does not mean you should.
JSON type Support
Postgres
Postgres has JSON and JSONB. Both the types accept JSON.
JSON Introduced in Postgres 9.2 in 2012. It is a string with additional tools to do some JSON validation.
JSONB Introduced in Postgres 9.4 in 2014. JSONB is stored in binary format with indexing capabilities to optimize reads.The other advatage of JSONB over JSOn is its efficient JSON parsing.
JSON vs JSONB table
What | JSON | JSONB |
Preserving Whitespace | JSON preserves whitespaces | JSONB does not preserve whitespace as it is binary format |
Ordering of keys | JSON preserves key ordering | JSONB does not preserve key ordering |
Duplicate keys | JSON preserves duplicate keys. But this is invalid JSON so issues might arise when reading and validating JSON | JSONB removes all duplicate keys and keeps the last one. |
Ingestion performance | Ingestion is faster in JSON | Ingestion is slower in JSONB |
JSON Provessing | Slow JSON procession as it is a text field | JSON processing is faster as the binary format has efficient storage and reading capabilities for JSON keys. |
Caveats of Postgres JSON types
Malformed JSON:Both give error if you do not add quote (') for key and value.
Duplicate Key: No error!
JSON: Stores as it is
JSONB: Stores only last key, which might lead to confusion and loss of information when JSON validation is not done at application layer and it is stored in database. MySQL throws an error during insert.
SQL Script
INSERT into json_postgres (json_col_json_type) values ('{"foo":"bar","foo":"bar"}');
INSERT into json_postgres (json_col_jsonb_type) values ('{"foo":"bar","foo":"bar"}');
OUTPUT:
id|json_col_json_type |json_col_jsonb_type|
--+-------------------------+-------------------+
1|{"foo":"bar","foo":"bar"}| |
2| |{"foo": "bar"} |
MySQL
JSON has been supported from v5.7.8.
Features include:
- Validating JSON documents during insert.
If you have a invalid JSON for a JSON type, MySQL will throw an error.
Example for this query: INSERT into json_mysql (json_col_json_type) values ('{"foo":bar}')
Error comes as.
SQL Error [3140] [22001]: Data truncation: Invalid JSON text: "Invalid value." at position 7 in value for column 'json_mysql.json_col_json_type'.
- Storage in Binary format for fast access
JSON type is in binary format which helps in faster access of JSON keys and members. If you stores the JSON in text format and you did a SELECT statement; it would do the following things
SELECT "foo" key-> JSON Parser -> Scan JSON for "foo" sequentially
On the other hand, JSON has a lookup table. Lookup table contains pointers all the the keys in the JSON document which can be queries efficiently with Binary Search.
Database | JSON |
Postgres | 2 data types; JSON and JSONB |
MySQL | 2 data types; JSON and TEXT |
Oracle | 3 data types. OSON Oracle's binary format introduced in 21c ; Can also store in BLOB , CLOB as well |
Microsoft SQL server | No specific JSON type. Store in NVARCHAR |
SQL example
MySQL
- Create table
create table json_mysql (
id bigint auto_increment primary key,
-- JSON is stored in binary format with metadata. Has storage overhead, but no runtime invalid JSON issues
`json_col_json_type` JSON,
`json_col_text_type` text
);
- Insert data
INSERT into json_mysql (json_col_json_type) values ('{"foo":"bar"}');
INSERT into json_mysql (json_col_text_type) values ('{"foo":"bar"}');
select `json_col_json_type`->>'$.foo' from json_mysql;
select `json_col_text_type`->>'$.foo' from json_mysql;
- Insert invalid JSON
INSERT into json_mysql (json_col_json_type) values ('{"foo":bar}') -- Throws error
select `json_col_json_type`->>'$.foo' from json_mysql;
INSERT into json_mysql (json_col_text_type) values ('{"foo":bar}') -- No error
select `json_col_text_type`->>'$.foo' from json_mysql;
JSON Methods
There are 20 methods
- JSON_OBJECT
SELECT JSON_OBJECT('foo', 'bar');
-- Output
|JSON_OBJECT('foo', 'bar')|
|-------------------------+
|{"foo": "bar"} |
- JSON_ARRAY and JSON_OBJECT
SELECT JSON_OBJECT('employeeIds', JSON_ARRAY('e1', 'e2', 'e3'));
-- Output
JSON_OBJECT('employeeIds', JSON_ARRAY('e1', 'e2', 'e3'))|
--------------------------------------------------------+
{"employeeIds": ["e1", "e2", "e3"]} |
- JSON_CONTAINS_PATH
SELECT `json_col_json_type`, JSON_CONTAINS_PATH(`json_col_json_type`, 'one', '$.foo') as count_foo from json_mysql;
-- Output
json_col_json_type|count_foo|
------------------+---------+
{"foo": "bar"} | 1|
- Check size
SELECT
max(JSON_STORAGE_SIZE(`json_col_text_type`)) as max_cell_json_size,
max(JSON_STORAGE_SIZE(`json_col_json_type`)) as max_cell_type_size,
max(char_length(sample_varchar_1)) as max_cell_varchar2_size
FROM json_mysql;
Postgres
- Create table
create table json_postgres (
id serial primary key,
json_col_json_type JSON, -- JSON is stored in binary format with metadata. This has storage overhead, but not runtime invalid JSON issues like Text
json_col_text_type text,
json_col_jsonb_type JSONB
);
- Insert data
INSERT into json_postgres (json_col_json_type) values ('{"foo":"bar","foo":"bar"}'); -- no error. stores as it is
INSERT into json_postgres (json_col_jsonb_type) values ('{"foo":"bar1","foo":"bar2"}');-- no error. but stores only last key
JSON Methods
Postgres JSON functions can be found here: Postgres JSON Functions
Index on JSON
For faster access of JSON data types, databases can be indexed for faster access.
MySQL
- Step 1: Check the
POSSIBLE_KEYS
attribute which decides the Index key
EXPLAIN SELECT * FROM json_mysql;
Output:
id|select_type|table |type|possible_keys|key|key_len|ref|rows|
--+-----------+----------+----+-------------+---+-------+---+----+
1|SIMPLE |json_mysql|ALL | | | | | 3|
- Step2: JSON supports indexing iff it is a generated column.
ALTER TABLE json_mysql ADD COLUMN foo VARCHAR(5)
GENERATED ALWAYS as (json_col_json_type->>'$.foo');
- Step3: Add index. Note that if you have an array like
{"foo":["bar"]};
, then this will lead to B-Tree issue as JSON value will not be valid for the["bar"]
in B-Tree index. Therefore, keep in mind that your JSON key maps to an normal object and not an array.
-- Check for NULL values to avoid incorrect B-Tree Indexes
SELECT JSON_UNQUOTE(JSON_EXTRACT(json_col_json_type, '$.foo')) FROM json_mysql;
-- Create index using BTree
ALTER TABLE json_mysql ADD INDEX foo (foo) USING BTREE;
- Step4: See
POSSIBLE_KEYS
for indexing used by MySQL
EXPLAIN SELECT * FROM json_mysql jm WHERE foo = 'bar';
Output:
id|select_type|table |type|possible_keys|key|key_len|ref |rows|
--+-----------+----------+----+-------------+---+-------+-----+----+
1|SIMPLE |json_mysql|ref |foo |foo|23 |const| 1|
Postgres
As discussed before, JSONB type in Postgres allows for indexing of JSON data. There are 3 types of indexes:
B-Tree Most common form of index data structure. It creates a tree that will he balanced when new nodes are added. While reading, the B-Tree will be. traversed to find the key and return the node with the data you are querying for.
GIN (Generalized Inverted indexes) B-Tree are great for finding normal operators with string values. But Postgres databases can store mutlitude type of data types such as JSONB, Arrays, Range types, Full-text search etc.
Hash-indexes Useful for equality operators for exact matches. It is faster than B-Tree for such operations with less memory footprint when compared to B-Tree and GIN indexes.
Memory consideration
Postgres
The page size of storing 1 row in Postgres is (2KB, 8KB or other size as configured). As JSON can lead to bigger page sizes, it can lead to overflow of the limit of 2KB. Postgres has a technique of storing overflow/oversize rows using TOAST(The Oversize Attribute Storage Technique) If data goes beyond 2KB, it will store the data to an out-of-line storage.
Postgres solves issue of row-oversizes via various techniques:
Compression
TOAST table for out-of-line storage
Combine Compression and Toast table
There are 4 strategies of TOAST storage:
Plain
Extended
External
Main
Storage type | Compression | Out-of-line storage | Usage | Order of compression + out-of-line storage |
Plain | No | No | Short rows that do not go beyond TOAST Limit | Not applicable |
Extended | Yes | Yes | JSONB and other data types | Compression done first; if row is still big, do OOL storage |
Main | Yes | No | If TOAST data is to be stored in same tablespace as the normal non-verflow data | Only compression |
External | No | Yes | If you want faster lookups than MAIN. But this comes at storage size penalty. | Only OOL |
MySQL
Since MySQL has decoupled storage engines such as InnoDB, MyISAM etc; hence handling would be different for in other separate engines. Considering the most popular and default storage engine which is InnoDB; there are some configuration to handle large JSON objects.
Similar to Postgres, if the row size goes beyond the default configured 8K page size, the JSON document can be moved to an external overflow space with a pointer to current row.
There are 4 row format for handling overflows:
Redundant Legacy row format available before MySQL 5.0.3 only. Can store maximum indexes of 768 bytes for any type of large column such as
VARCHAR
,BLOB
,TEXT
. Overflow is put into new overflow page with a 20 byte pointer from main page -> overflow page. If overflow page overflows (inception!), then a new pointer is created from overflow page 1 -> overflow page 2 -> ...Compact Similar to Redundant, but has storage savings of upto 20% due to compaction.
Dynamic Dynamic storage format tries to keep all data in one page. If data is too big, whole data is stored in seperate page instead of breaking up the data and using pointers to map from map to overflow page. Index Key prefix length is 3072 bytes instead of 768 bytes.
Compressed Similar index key prefix length of 3072 bytes. It does page-level compression which is not done in previous row formats
Conclusion
JSON is supported in Relational Databases, but the implementation and nuance on its usage is needed to be understood in order to makes a decision on how and if to use JSONs in databases such as Oracle, MySQL and Postgres!