> ## Documentation Index
> Fetch the complete documentation index at: https://private-7c7dfe99-mintlify-3a82795f.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

> Documentation for the JSON data type in ClickHouse, which provides native support for working with JSON data

# JSON Data Type

<Card title="Looking for a guide?" href="/concepts/best-practices/json-type" icon="book">
  Check out our JSON best practice guide for examples, advanced features and considerations for using the JSON type.
</Card>

The `JSON` type stores JavaScript Object Notation (JSON) documents in a single column.

<Note>
  In ClickHouse Open-Source JSON data type is marked as production ready in version 25.3. It's not recommended to use this type in production in previous versions.
</Note>

To declare a column of `JSON` type, you can use the following syntax:

```sql theme={null}
<column_name> JSON
(
    max_dynamic_paths=N,
    max_dynamic_types=M,
    some.path TypeName,
    SKIP path.to.skip,
    SKIP REGEXP 'paths_regexp'
)
```

Where the parameters in the syntax above are defined as:

| Parameter                   | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Default Value |
| --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------- |
| `max_dynamic_paths`         | An optional parameter indicating how many paths can be stored separately as sub-columns across single block of data that is stored separately (for example across single data part for MergeTree table). <br /><br />If this limit is exceeded, all other paths will be stored together in a single structure called [shared data](#shared-data-structure).<br /><br />There are also [ways](#controlling-the-number-of-dynamic-paths) how to change the limit on dynamic paths without changing this parameter. | `1024`        |
| `max_dynamic_types`         | An optional parameter between `1` and `255` indicating how many different data types can be stored separately inside a single path column with type `Dynamic` across single block of data that is stored separately (for example across single data part for MergeTree table). <br /><br />If this limit is exceeded, all new types will be stored together in a single structure called `shared variant`.                                                                                                       | `32`          |
| `some.path TypeName`        | An optional type hint for particular path in the JSON. Such paths will be always stored as sub-columns with specified type.                                                                                                                                                                                                                                                                                                                                                                                      |               |
| `SKIP path.to.skip`         | An optional hint for particular path that should be skipped during JSON parsing. Such paths will never be stored in the JSON column. If specified path is a nested JSON object, the whole nested object will be skipped.                                                                                                                                                                                                                                                                                         |               |
| `SKIP REGEXP 'path_regexp'` | An optional hint with a regular expression that is used to skip paths during JSON parsing. All paths that match this regular expression will never be stored in the JSON column.                                                                                                                                                                                                                                                                                                                                 |               |

<h2 id="when-to-use-json-type">
  When to use the `JSON` Type
</h2>

The `JSON` type is designed for querying, filtering, and aggregating specific fields within JSON objects that have dynamic or unpredictable structures. It achieves this by splitting JSON objects into separate sub-columns, which dramatically reduces data read and speeds up queries on selected fields compared to alternatives like `Map` or parsing strings.

**However, this comes with important trade-offs:**

* Slower `INSERT`s - Splitting JSON into sub-columns, performing type inference, and managing flexible storage structures makes inserts slower compared to storing JSON as a simple `String` column.
* Slower when reading entire objects - If you need to retrieve complete JSON documents (rather than specific fields), the `JSON` type is slower than reading from a `String` column. The overhead of reconstructing objects from separate sub-columns provides no benefit when you're not doing field-level queries.
* Storage overhead - Maintaining separate sub-columns adds structural overhead compared to storing JSON as a single string value.

<h3 id="use-json-type">
  Use the `JSON` type when:
</h3>

* Your data has a dynamic or unpredictable structure with varying keys across documents
* Field types or schemas change over time or vary between records
* You need to query, filter, or aggregate on specific paths within JSON objects whose structure you can't predict upfront
* Your use case involves semi-structured data like logs, events, or user-generated content with inconsistent schemas

<h3 id="use-string-type">
  Use a `String` column (or structured types) when:
</h3>

* Your data structure is known and consistent - in this case, use normal columns, `Tuple`, `Array`, `Dynamic`, or `Variant` types instead
* `JSON` documents are treated as opaque blobs that are only stored and retrieved in their entirety without field-level analysis
* You don't need to query or filter on individual JSON fields within the database
* The `JSON` is simply a transport/storage format, not analyzed within ClickHouse

<Tip>
  If `JSON` is an opaque document that isn't analyzed inside the database, and only stored and retrieved back, it should be stored as a `String` field. The `JSON` type's benefits only materialize when you need to efficiently query, filter, or aggregate on specific fields within dynamic `JSON` structures.

  You can also mix approaches—use standard columns for predictable top-level fields and a `JSON` column for dynamic sections of the payload.
</Tip>

<h2 id="creating-json">
  Creating `JSON`
</h2>

In this section we'll take a look at the various ways that you can create `JSON`.

<h3 id="using-json-in-a-table-column-definition">
  Using `JSON` in a table column definition
</h3>

```sql title="Query (Example 1)" theme={null}
CREATE TABLE test (json JSON) ENGINE = Memory;
INSERT INTO test VALUES ('{"a" : {"b" : 42}, "c" : [1, 2, 3]}'), ('{"f" : "Hello, World!"}'), ('{"a" : {"b" : 43, "e" : 10}, "c" : [4, 5, 6]}');
SELECT json FROM test;
```

```text title="Response (Example 1)" theme={null}
┌─json────────────────────────────────────────┐
│ {"a":{"b":"42"},"c":["1","2","3"]}          │
│ {"f":"Hello, World!"}                       │
│ {"a":{"b":"43","e":"10"},"c":["4","5","6"]} │
└─────────────────────────────────────────────┘
```

```sql title="Query (Example 2)" theme={null}
CREATE TABLE test (json JSON(a.b UInt32, SKIP a.e)) ENGINE = Memory;
INSERT INTO test VALUES ('{"a" : {"b" : 42}, "c" : [1, 2, 3]}'), ('{"f" : "Hello, World!"}'), ('{"a" : {"b" : 43, "e" : 10}, "c" : [4, 5, 6]}');
SELECT json FROM test;
```

```text title="Response (Example 2)" theme={null}
┌─json──────────────────────────────┐
│ {"a":{"b":42},"c":["1","2","3"]}  │
│ {"a":{"b":0},"f":"Hello, World!"} │
│ {"a":{"b":43},"c":["4","5","6"]}  │
└───────────────────────────────────┘
```

<h3 id="using-cast-with-json">
  Using CAST with `::JSON`
</h3>

It is possible to cast various types using the special syntax `::JSON`.

<h4 id="cast-from-string-to-json">
  CAST from `String` to `JSON`
</h4>

```sql title="Query" theme={null}
SELECT '{"a" : {"b" : 42},"c" : [1, 2, 3], "d" : "Hello, World!"}'::JSON AS json;
```

```text title="Response" theme={null}
┌─json───────────────────────────────────────────────────┐
│ {"a":{"b":"42"},"c":["1","2","3"],"d":"Hello, World!"} │
└────────────────────────────────────────────────────────┘
```

<h4 id="cast-from-tuple-to-json">
  CAST from `Tuple` to `JSON`
</h4>

```sql title="Query" theme={null}
SET enable_named_columns_in_function_tuple = 1;
SELECT (tuple(42 AS b) AS a, [1, 2, 3] AS c, 'Hello, World!' AS d)::JSON AS json;
```

```text title="Response" theme={null}
┌─json───────────────────────────────────────────────────┐
│ {"a":{"b":"42"},"c":["1","2","3"],"d":"Hello, World!"} │
└────────────────────────────────────────────────────────┘
```

<h4 id="cast-from-map-to-json">
  CAST from `Map` to `JSON`
</h4>

```sql title="Query" theme={null}
SET use_variant_as_common_type=1;
SELECT map('a', map('b', 42), 'c', [1,2,3], 'd', 'Hello, World!')::JSON AS json;
```

```text title="Response" theme={null}
┌─json───────────────────────────────────────────────────┐
│ {"a":{"b":"42"},"c":["1","2","3"],"d":"Hello, World!"} │
└────────────────────────────────────────────────────────┘
```

<Note>
  JSON paths are stored flattened. This means that when a JSON object is formatted from a path like `a.b.c`
  it is not possible to know whether the object should be constructed as `{ "a.b.c" : ... }` or `{ "a": { "b": { "c": ... } } }`.
  Our implementation will always assume the latter.

  For example:

  ```sql title="Query" theme={null}
  SELECT CAST('{"a.b.c" : 42}', 'JSON') AS json
  ```

  will return:

  ```response title="Response" theme={null}
     ┌─json───────────────────┐
  1. │ {"a":{"b":{"c":"42"}}} │
     └────────────────────────┘
  ```

  and **not**:

  ```sql theme={null}
     ┌─json───────────┐
  1. │ {"a.b.c":"42"} │
     └────────────────┘
  ```
</Note>

<h2 id="reading-json-paths-as-sub-columns">
  Reading JSON paths as sub-columns
</h2>

The `JSON` type supports reading every path as a separate sub-column.
If the type of the requested path is not specified in the JSON type declaration,
then the sub column of the path will always have type [Dynamic](/reference/data-types/dynamic).

For example:

```sql title="Query" theme={null}
CREATE TABLE test (json JSON(a.b UInt32, SKIP a.e)) ENGINE = Memory;
INSERT INTO test VALUES ('{"a" : {"b" : 42, "g" : 42.42}, "c" : [1, 2, 3], "d" : "2020-01-01"}'), ('{"f" : "Hello, World!", "d" : "2020-01-02"}'), ('{"a" : {"b" : 43, "e" : 10, "g" : 43.43}, "c" : [4, 5, 6]}');
SELECT json FROM test;
```

```text title="Response" theme={null}
┌─json────────────────────────────────────────────────────────┐
│ {"a":{"b":42,"g":42.42},"c":["1","2","3"],"d":"2020-01-01"} │
│ {"a":{"b":0},"d":"2020-01-02","f":"Hello, World!"}          │
│ {"a":{"b":43,"g":43.43},"c":["4","5","6"]}                  │
└─────────────────────────────────────────────────────────────┘
```

```sql title="Query (Reading JSON paths as sub-columns)" theme={null}
SELECT json.a.b, json.a.g, json.c, json.d FROM test;
```

```text title="Response (Reading JSON paths as sub-columns)" theme={null}
┌─json.a.b─┬─json.a.g─┬─json.c──┬─json.d─────┐
│       42 │ 42.42    │ [1,2,3] │ 2020-01-01 │
│        0 │ ᴺᵁᴸᴸ     │ ᴺᵁᴸᴸ    │ 2020-01-02 │
│       43 │ 43.43    │ [4,5,6] │ ᴺᵁᴸᴸ       │
└──────────┴──────────┴─────────┴────────────┘
```

You can also use `getSubcolumn` function to read subcolumns from JSON type:

```sql title="Query" theme={null}
SELECT getSubcolumn(json, 'a.b'), getSubcolumn(json, 'a.g'), getSubcolumn(json, 'c'), getSubcolumn(json, 'd') FROM test;
```

```text title="Response" theme={null}
┌─getSubcolumn(json, 'a.b')─┬─getSubcolumn(json, 'a.g')─┬─getSubcolumn(json, 'c')─┬─getSubcolumn(json, 'd')─┐
│                        42 │ 42.42                     │ [1,2,3]                 │ 2020-01-01              │
│                         0 │ ᴺᵁᴸᴸ                      │ ᴺᵁᴸᴸ                    │ 2020-01-02              │
│                        43 │ 43.43                     │ [4,5,6]                 │ ᴺᵁᴸᴸ                    │
└───────────────────────────┴───────────────────────────┴─────────────────────────┴─────────────────────────┘
```

If the requested path wasn't found in the data, it will be filled with `NULL` values:

```sql title="Query" theme={null}
SELECT json.non.existing.path FROM test;
```

```text title="Response" theme={null}
┌─json.non.existing.path─┐
│ ᴺᵁᴸᴸ                   │
│ ᴺᵁᴸᴸ                   │
│ ᴺᵁᴸᴸ                   │
└────────────────────────┘
```

Let's check the data types of the returned sub-columns:

```sql title="Query" theme={null}
SELECT toTypeName(json.a.b), toTypeName(json.a.g), toTypeName(json.c), toTypeName(json.d) FROM test;
```

```text title="Response" theme={null}
┌─toTypeName(json.a.b)─┬─toTypeName(json.a.g)─┬─toTypeName(json.c)─┬─toTypeName(json.d)─┐
│ UInt32               │ Dynamic              │ Dynamic            │ Dynamic            │
│ UInt32               │ Dynamic              │ Dynamic            │ Dynamic            │
│ UInt32               │ Dynamic              │ Dynamic            │ Dynamic            │
└──────────────────────┴──────────────────────┴────────────────────┴────────────────────┘
```

As we can see, for `a.b`, the type is `UInt32` as we specified it to be in the JSON type declaration,
and for all other sub-columns the type is `Dynamic`.

It is also possible to read sub-columns of a `Dynamic` type using the special syntax `json.some.path.:TypeName`:

```sql title="Query" theme={null}
SELECT
    json.a.g.:Float64,
    dynamicType(json.a.g),
    json.d.:Date,
    dynamicType(json.d)
FROM test
```

```text title="Response" theme={null}
┌─json.a.g.:`Float64`─┬─dynamicType(json.a.g)─┬─json.d.:`Date`─┬─dynamicType(json.d)─┐
│               42.42 │ Float64               │     2020-01-01 │ Date                │
│                ᴺᵁᴸᴸ │ None                  │     2020-01-02 │ Date                │
│               43.43 │ Float64               │           ᴺᵁᴸᴸ │ None                │
└─────────────────────┴───────────────────────┴────────────────┴─────────────────────┘
```

`Dynamic` sub-columns can be cast to any data type. In this case an exception will be thrown if the internal type inside `Dynamic` cannot be cast to the requested type:

```sql title="Query" theme={null}
SELECT json.a.g::UInt64 AS uint
FROM test;
```

```text title="Response" theme={null}
┌─uint─┐
│   42 │
│    0 │
│   43 │
└──────┘
```

```sql title="Query" theme={null}
SELECT json.a.g::UUID AS float
FROM test;
```

```text title="Response" theme={null}
Received exception from server:
Code: 48. DB::Exception: Received from localhost:9000. DB::Exception:
Conversion between numeric types and UUID is not supported.
Probably the passed UUID is unquoted:
while executing 'FUNCTION CAST(__table1.json.a.g :: 2, 'UUID'_String :: 1) -> CAST(__table1.json.a.g, 'UUID'_String) UUID : 0'.
(NOT_IMPLEMENTED)
```

<Note>
  To read subcolumns efficiently from Compact MergeTree parts make sure MergeTree setting [write\_marks\_for\_substreams\_in\_compact\_parts](/reference/settings/merge-tree-settings#write_marks_for_substreams_in_compact_parts) is enabled.
</Note>

<h2 id="reading-json-sub-objects-as-sub-columns">
  Reading JSON sub-objects as sub-columns
</h2>

The `JSON` type supports reading nested objects as sub-columns with type `JSON` using the special syntax `json.^some.path`:

```sql title="Query" theme={null}
CREATE TABLE test (json JSON) ENGINE = Memory;
INSERT INTO test VALUES ('{"a" : {"b" : {"c" : 42, "g" : 42.42}}, "c" : [1, 2, 3], "d" : {"e" : {"f" : {"g" : "Hello, World", "h" : [1, 2, 3]}}}}'), ('{"f" : "Hello, World!", "d" : {"e" : {"f" : {"h" : [4, 5, 6]}}}}'), ('{"a" : {"b" : {"c" : 43, "e" : 10, "g" : 43.43}}, "c" : [4, 5, 6]}');
SELECT json FROM test;
```

```text title="Response" theme={null}
┌─json──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ {"a":{"b":{"c":"42","g":42.42}},"c":["1","2","3"],"d":{"e":{"f":{"g":"Hello, World","h":["1","2","3"]}}}} │
│ {"d":{"e":{"f":{"h":["4","5","6"]}}},"f":"Hello, World!"}                                                 │
│ {"a":{"b":{"c":"43","e":"10","g":43.43}},"c":["4","5","6"]}                                               │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

```sql title="Query" theme={null}
SELECT json.^a.b, json.^d.e.f FROM test;
```

```text title="Response" theme={null}
┌─json.^`a`.b───────────────────┬─json.^`d`.e.f──────────────────────────┐
│ {"c":"42","g":42.42}          │ {"g":"Hello, World","h":["1","2","3"]} │
│ {}                            │ {"h":["4","5","6"]}                    │
│ {"c":"43","e":"10","g":43.43} │ {}                                     │
└───────────────────────────────┴────────────────────────────────────────┘
```

<Note>
  When paths are stored in basic (`map`) [shared data](#shared-data-structure), reading sub-object sub-columns may be inefficient as it requires scanning the entire shared data structure. With `map_with_buckets` or `advanced` shared data serialization, reading sub-columns from shared data is highly optimized.
</Note>

<h2 id="reading-json-combined-sub-columns">
  Reading JSON combined sub-columns
</h2>

The `JSON` type supports reading a path as a **combined sub-column** using the special syntax `json.@some.path`.
A combined sub-column for a given path returns:

* The literal value stored at that path as `Dynamic`, if the path has a literal value.
* A JSON sub-object at that path as `Dynamic`, if the path has no literal value but has nested sub-paths.
* `NULL`, if neither a literal value nor any sub-paths exist for that path.

This is useful when a path may hold either a scalar value or a nested object across different rows, and is more convenient than separately querying the literal sub-column (`json.a`) and the sub-object sub-column (`json.^a`).

The following example compares all three sub-column types for path `a`:

```sql title="Query" theme={null}
CREATE TABLE test (json JSON) ENGINE = Memory;
INSERT INTO test VALUES ('{"a" : 42, "b" : {"c" : 1, "d" : "Hello"}}'), ('{"a" : {"x": 1, "y": 2}, "b" : {"c" : 1}}'), ('{"c" : "World"}');
SELECT json FROM test;
```

```text title="Response" theme={null}
┌─json────────────────────────────┐
│ {"a":42,"b":{"c":1,"d":"Hello"}}│
│ {"a":{"x":1,"y":2},"b":{"c":1}}│
│ {"c":"World"}                   │
└─────────────────────────────────┘
```

```sql title="Query" theme={null}
SELECT
    json.a,
    dynamicType(json.a),
    json.^a,
    toTypeName(json.^a),
    json.@a,
    dynamicType(json.@a)
FROM test;
```

```text title="Response" theme={null}
┌─json.a─┬─dynamicType(json.a)─┬─json.^a───────┬─toTypeName(json.^a)─┬─json.@a───────┬─dynamicType(json.@a)─┐
│ 42     │ Int64               │ {}            │ JSON                │ 42            │ Int64                │
│ NULL   │ None                │ {"x":1,"y":2} │ JSON                │ {"x":1,"y":2} │ JSON                 │
│ NULL   │ None                │ {}            │ JSON                │ NULL          │ None                 │
└────────┴─────────────────────┴───────────────┴─────────────────────┴───────────────┴──────────────────────┘
```

* Row 1: `a` holds a literal `42`. `json.a` returns it as `Dynamic(Int64)`, `json.^a` returns an empty sub-object `{}` (no nested keys under `a`), and `json.@a` returns the literal `42`.
* Row 2: `a` holds a nested object. `json.a` returns `NULL` (no literal at that path), `json.^a` returns the sub-object as `JSON`, and `json.@a` also returns the sub-object as `Dynamic(JSON)`.
* Row 3: `a` is absent entirely. Both `json.a` and `json.@a` return `NULL`, while `json.^a` returns an empty `{}`.

<Note>
  When paths are stored in basic (`map`) [shared data](#shared-data-structure), reading combined sub-columns may be inefficient as it requires scanning the entire shared data structure. With `map_with_buckets` or `advanced` shared data serialization, reading sub-columns from shared data is highly optimized.
</Note>

<h2 id="type-inference-for-paths">
  Type inference for paths
</h2>

During parsing of `JSON`, ClickHouse tries to detect the most appropriate data type for each JSON path.
It works similarly to [automatic schema inference from input data](/concepts/features/interfaces/schema-inference),
and is controlled by the same settings:

* [input\_format\_try\_infer\_dates](/reference/settings/formats#input_format_try_infer_dates)
* [input\_format\_try\_infer\_datetimes](/reference/settings/formats#input_format_try_infer_datetimes)
* [schema\_inference\_make\_columns\_nullable](/reference/settings/formats#schema_inference_make_columns_nullable)
* [input\_format\_json\_try\_infer\_numbers\_from\_strings](/reference/settings/formats#input_format_json_try_infer_numbers_from_strings)
* [input\_format\_json\_infer\_incomplete\_types\_as\_strings](/reference/settings/formats#input_format_json_infer_incomplete_types_as_strings)
* [input\_format\_json\_read\_numbers\_as\_strings](/reference/settings/formats#input_format_json_read_numbers_as_strings)
* [input\_format\_json\_read\_bools\_as\_strings](/reference/settings/formats#input_format_json_read_bools_as_strings)
* [input\_format\_json\_read\_bools\_as\_numbers](/reference/settings/formats#input_format_json_read_bools_as_numbers)
* [input\_format\_json\_read\_arrays\_as\_strings](/reference/settings/formats#input_format_json_read_arrays_as_strings)
* [input\_format\_json\_infer\_array\_of\_dynamic\_from\_array\_of\_different\_types](/reference/settings/formats#input_format_json_infer_array_of_dynamic_from_array_of_different_types)

Let's take a look at some examples:

```sql title="Query" theme={null}
SELECT JSONAllPathsWithTypes('{"a" : "2020-01-01", "b" : "2020-01-01 10:00:00"}'::JSON) AS paths_with_types settings input_format_try_infer_dates=1, input_format_try_infer_datetimes=1;
```

```text title="Response" theme={null}
┌─paths_with_types─────────────────┐
│ {'a':'Date','b':'DateTime64(9)'} │
└──────────────────────────────────┘
```

```sql title="Query" theme={null}
SELECT JSONAllPathsWithTypes('{"a" : "2020-01-01", "b" : "2020-01-01 10:00:00"}'::JSON) AS paths_with_types settings input_format_try_infer_dates=0, input_format_try_infer_datetimes=0;
```

```text title="Response" theme={null}
┌─paths_with_types────────────┐
│ {'a':'String','b':'String'} │
└─────────────────────────────┘
```

```sql title="Query" theme={null}
SELECT JSONAllPathsWithTypes('{"a" : [1, 2, 3]}'::JSON) AS paths_with_types settings schema_inference_make_columns_nullable=1;
```

```text title="Response" theme={null}
┌─paths_with_types───────────────┐
│ {'a':'Array(Nullable(Int64))'} │
└────────────────────────────────┘
```

```sql title="Query" theme={null}
SELECT JSONAllPathsWithTypes('{"a" : [1, 2, 3]}'::JSON) AS paths_with_types settings schema_inference_make_columns_nullable=0;
```

```text title="Response" theme={null}
┌─paths_with_types─────┐
│ {'a':'Array(Int64)'} │
└──────────────────────┘
```

<h2 id="handling-arrays-of-json-objects">
  Handling arrays of JSON objects
</h2>

JSON paths that contain an array of objects are parsed as type `Array(JSON)` and inserted into a `Dynamic` column for the path.
To read an array of objects, you can extract it from the `Dynamic` column as a sub-column:

```sql title="Query" theme={null}
CREATE TABLE test (json JSON) ENGINE = Memory;
INSERT INTO test VALUES
('{"a" : {"b" : [{"c" : 42, "d" : "Hello", "f" : [[{"g" : 42.42}]], "k" : {"j" : 1000}}, {"c" : 43}, {"e" : [1, 2, 3], "d" : "My", "f" : [[{"g" : 43.43, "h" : "2020-01-01"}]],  "k" : {"j" : 2000}}]}}'),
('{"a" : {"b" : [1, 2, 3]}}'),
('{"a" : {"b" : [{"c" : 44, "f" : [[{"h" : "2020-01-02"}]]}, {"e" : [4, 5, 6], "d" : "World", "f" : [[{"g" : 44.44}]],  "k" : {"j" : 3000}}]}}');
SELECT json FROM test;
```

```text title="Response" theme={null}
┌─json────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ {"a":{"b":[{"c":"42","d":"Hello","f":[[{"g":42.42}]],"k":{"j":"1000"}},{"c":"43"},{"d":"My","e":["1","2","3"],"f":[[{"g":43.43,"h":"2020-01-01"}]],"k":{"j":"2000"}}]}} │
│ {"a":{"b":["1","2","3"]}}                                                                                                                                               │
│ {"a":{"b":[{"c":"44","f":[[{"h":"2020-01-02"}]]},{"d":"World","e":["4","5","6"],"f":[[{"g":44.44}]],"k":{"j":"3000"}}]}}                                                │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

```sql title="Query" theme={null}
SELECT json.a.b, dynamicType(json.a.b) FROM test;
```

```text title="Response" theme={null}
┌─json.a.b──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─dynamicType(json.a.b)────────────────────────────────────┐
│ ['{"c":"42","d":"Hello","f":[[{"g":42.42}]],"k":{"j":"1000"}}','{"c":"43"}','{"d":"My","e":["1","2","3"],"f":[[{"g":43.43,"h":"2020-01-01"}]],"k":{"j":"2000"}}'] │ Array(JSON(max_dynamic_types=16, max_dynamic_paths=256)) │
│ [1,2,3]                                                                                                                                                           │ Array(Nullable(Int64))                                   │
│ ['{"c":"44","f":[[{"h":"2020-01-02"}]]}','{"d":"World","e":["4","5","6"],"f":[[{"g":44.44}]],"k":{"j":"3000"}}']                                                  │ Array(JSON(max_dynamic_types=16, max_dynamic_paths=256)) │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────┘
```

As you may have noticed, the `max_dynamic_types`/`max_dynamic_paths` parameters of the nested `JSON` type got reduced compared to the default values.
This is needed to avoid the number of sub-columns growing uncontrollably on nested arrays of JSON objects.

Let's try to read sub-columns from a nested `JSON` column:

```sql title="Query" theme={null}
SELECT json.a.b.:`Array(JSON)`.c, json.a.b.:`Array(JSON)`.f, json.a.b.:`Array(JSON)`.d FROM test;
```

```text title="Response" theme={null}
┌─json.a.b.:`Array(JSON)`.c─┬─json.a.b.:`Array(JSON)`.f───────────────────────────────────┬─json.a.b.:`Array(JSON)`.d─┐
│ [42,43,NULL]              │ [[['{"g":42.42}']],NULL,[['{"g":43.43,"h":"2020-01-01"}']]] │ ['Hello',NULL,'My']       │
│ []                        │ []                                                          │ []                        │
│ [44,NULL]                 │ [[['{"h":"2020-01-02"}']],[['{"g":44.44}']]]                │ [NULL,'World']            │
└───────────────────────────┴─────────────────────────────────────────────────────────────┴───────────────────────────┘
```

We can avoid writing `Array(JSON)` sub-column names using a special syntax:

```sql title="Query" theme={null}
SELECT json.a.b[].c, json.a.b[].f, json.a.b[].d FROM test;
```

```text title="Response" theme={null}
┌─json.a.b.:`Array(JSON)`.c─┬─json.a.b.:`Array(JSON)`.f───────────────────────────────────┬─json.a.b.:`Array(JSON)`.d─┐
│ [42,43,NULL]              │ [[['{"g":42.42}']],NULL,[['{"g":43.43,"h":"2020-01-01"}']]] │ ['Hello',NULL,'My']       │
│ []                        │ []                                                          │ []                        │
│ [44,NULL]                 │ [[['{"h":"2020-01-02"}']],[['{"g":44.44}']]]                │ [NULL,'World']            │
└───────────────────────────┴─────────────────────────────────────────────────────────────┴───────────────────────────┘
```

The number of `[]` after the path indicates the array level. For example, `json.path[][]` will be transformed to `json.path.:Array(Array(JSON))`

Let's check the paths and types inside our `Array(JSON)`:

```sql title="Query" theme={null}
SELECT DISTINCT arrayJoin(JSONAllPathsWithTypes(arrayJoin(json.a.b[]))) FROM test;
```

```text title="Response" theme={null}
┌─arrayJoin(JSONAllPathsWithTypes(arrayJoin(json.a.b.:`Array(JSON)`)))──┐
│ ('c','Int64')                                                         │
│ ('d','String')                                                        │
│ ('f','Array(Array(JSON(max_dynamic_types=8, max_dynamic_paths=64)))') │
│ ('k.j','Int64')                                                       │
│ ('e','Array(Nullable(Int64))')                                        │
└───────────────────────────────────────────────────────────────────────┘
```

Let's read sub-columns from an `Array(JSON)` column:

```sql title="Query" theme={null}
SELECT json.a.b[].c.:Int64, json.a.b[].f[][].g.:Float64, json.a.b[].f[][].h.:Date FROM test;
```

```text title="Response" theme={null}
┌─json.a.b.:`Array(JSON)`.c.:`Int64`─┬─json.a.b.:`Array(JSON)`.f.:`Array(Array(JSON))`.g.:`Float64`─┬─json.a.b.:`Array(JSON)`.f.:`Array(Array(JSON))`.h.:`Date`─┐
│ [42,43,NULL]                       │ [[[42.42]],[],[[43.43]]]                                     │ [[[NULL]],[],[['2020-01-01']]]                            │
│ []                                 │ []                                                           │ []                                                        │
│ [44,NULL]                          │ [[[NULL]],[[44.44]]]                                         │ [[['2020-01-02']],[[NULL]]]                               │
└────────────────────────────────────┴──────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────┘
```

We can also read sub-object sub-columns from a nested `JSON` column:

```sql title="Query" theme={null}
SELECT json.a.b[].^k FROM test
```

```text title="Response" theme={null}
┌─json.a.b.:`Array(JSON)`.^`k`─────────┐
│ ['{"j":"1000"}','{}','{"j":"2000"}'] │
│ []                                   │
│ ['{}','{"j":"3000"}']                │
└──────────────────────────────────────┘
```

<h2 id="handling-json-keys-with-nulls">
  Handling JSON keys with NULL
</h2>

In our JSON implementation `null` and absence of the value are considered equivalent:

```sql title="Query" theme={null}
SELECT '{}'::JSON AS json1, '{"a" : null}'::JSON AS json2, json1 = json2
```

```text title="Response" theme={null}
┌─json1─┬─json2─┬─equals(json1, json2)─┐
│ {}    │ {}    │                    1 │
└───────┴───────┴──────────────────────┘
```

It means that it's impossible to determine whether the original JSON data contained some path with the NULL value or didn't contain it at all.

<h2 id="handling-json-keys-with-dots">
  Handling JSON keys with dots
</h2>

Internally JSON column stores all paths and values in a flattened form. It means that by default these 2 objects are considered as the same:

```json theme={null}
{"a" : {"b" : 42}}
{"a.b" : 42}
```

They both will be stored internally as a pair of path `a.b` and value `42`. During formatting of JSON we always form nested objects based on the path parts separated by dot:

```sql title="Query" theme={null}
SELECT '{"a" : {"b" : 42}}'::JSON AS json1, '{"a.b" : 42}'::JSON AS json2, JSONAllPaths(json1), JSONAllPaths(json2);
```

```text title="Response" theme={null}
┌─json1────────────┬─json2────────────┬─JSONAllPaths(json1)─┬─JSONAllPaths(json2)─┐
│ {"a":{"b":"42"}} │ {"a":{"b":"42"}} │ ['a.b']             │ ['a.b']             │
└──────────────────┴──────────────────┴─────────────────────┴─────────────────────┘
```

As you can see, initial JSON `{"a.b" : 42}` is now formatted as `{"a" : {"b" : 42}}`.

This limitation also leads to the failure of parsing valid JSON objects like this:

```sql title="Query" theme={null}
SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON AS json;
```

```text title="Response" theme={null}
Code: 117. DB::Exception: Cannot insert data into JSON column: Duplicate path found during parsing JSON object: a.b. You can enable setting type_json_skip_duplicated_paths to skip duplicated paths during insert: In scope SELECT CAST('{"a.b" : 42, "a" : {"b" : "Hello, World"}}', 'JSON') AS json. (INCORRECT_DATA)
```

If you want to keep keys with dots and avoid formatting them as nested objects, you can enable
setting [json\_type\_escape\_dots\_in\_keys](/reference/settings/formats#json_type_escape_dots_in_keys) (available starting from version `25.8`). In this case during parsing all dots in JSON keys will be
escaped into `%2E` and unescaped back during formatting.

```sql title="Query" theme={null}
SET json_type_escape_dots_in_keys=1;
SELECT '{"a" : {"b" : 42}}'::JSON AS json1, '{"a.b" : 42}'::JSON AS json2, JSONAllPaths(json1), JSONAllPaths(json2);
```

```text title="Response" theme={null}
┌─json1────────────┬─json2────────┬─JSONAllPaths(json1)─┬─JSONAllPaths(json2)─┐
│ {"a":{"b":"42"}} │ {"a.b":"42"} │ ['a.b']             │ ['a%2Eb']           │
└──────────────────┴──────────────┴─────────────────────┴─────────────────────┘
```

```sql title="Query" theme={null}
SET json_type_escape_dots_in_keys=1;
SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON AS json, JSONAllPaths(json);
```

```text title="Response" theme={null}
┌─json──────────────────────────────────┬─JSONAllPaths(json)─┐
│ {"a.b":"42","a":{"b":"Hello World!"}} │ ['a%2Eb','a.b']    │
└───────────────────────────────────────┴────────────────────┘
```

To read key with escaped dot as a subcolumn you have to use escaped dot in the subcolumn name:

```sql title="Query" theme={null}
SET json_type_escape_dots_in_keys=1;
SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON AS json, json.`a%2Eb`, json.a.b;
```

```text title="Response" theme={null}
┌─json──────────────────────────────────┬─json.a%2Eb─┬─json.a.b─────┐
│ {"a.b":"42","a":{"b":"Hello World!"}} │ 42         │ Hello World! │
└───────────────────────────────────────┴────────────┴──────────────┘
```

Note: due to identifiers parser and analyzer limitations subcolumn `` json.`a.b` `` is equivalent to subcolumn `json.a.b` and won't read path with escaped dot:

```sql title="Query" theme={null}
SET json_type_escape_dots_in_keys=1;
SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON AS json, json.`a%2Eb`, json.`a.b`, json.a.b;
```

```text title="Response" theme={null}
┌─json──────────────────────────────────┬─json.a%2Eb─┬─json.a.b─────┬─json.a.b─────┐
│ {"a.b":"42","a":{"b":"Hello World!"}} │ 42         │ Hello World! │ Hello World! │
└───────────────────────────────────────┴────────────┴──────────────┴──────────────┘
```

Also, if you want to specify a hint for a JSON path that contains keys with dots (or use it in the `SKIP`/`SKIP REGEX` sections), you have to use escaped dots in the hint:

```sql title="Query" theme={null}
SET json_type_escape_dots_in_keys=1;
SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON(`a%2Eb` UInt8) as json, json.`a%2Eb`, toTypeName(json.`a%2Eb`);
```

```text title="Response" theme={null}
┌─json────────────────────────────────┬─json.a%2Eb─┬─toTypeName(json.a%2Eb)─┐
│ {"a.b":42,"a":{"b":"Hello World!"}} │         42 │ UInt8                  │
└─────────────────────────────────────┴────────────┴────────────────────────┘
```

```sql title="Query" theme={null}
SET json_type_escape_dots_in_keys=1;
SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON(SKIP `a%2Eb`) as json, json.`a%2Eb`;
```

```text title="Response" theme={null}
┌─json───────────────────────┬─json.a%2Eb─┐
│ {"a":{"b":"Hello World!"}} │ ᴺᵁᴸᴸ       │
└────────────────────────────┴────────────┘
```

<h2 id="reading-json-type-from-data">
  Reading JSON type from data
</h2>

All text formats
([`JSONEachRow`](/reference/formats/JSON/JSONEachRow),
[`TSV`](/reference/formats/TabSeparated/TabSeparated),
[`CSV`](/reference/formats/CSV/CSV),
[`CustomSeparated`](/reference/formats/CustomSeparated/CustomSeparated),
[`Values`](/reference/formats/Values), etc.) support reading the `JSON` type.

Examples:

```sql title="Query" theme={null}
SELECT json FROM format(JSONEachRow, 'json JSON(a.b.c UInt32, SKIP a.b.d, SKIP d.e, SKIP REGEXP \'b.*\')', '
{"json" : {"a" : {"b" : {"c" : 1, "d" : [0, 1]}}, "b" : "2020-01-01", "c" : 42, "d" : {"e" : {"f" : ["s1", "s2"]}, "i" : [1, 2, 3]}}}
{"json" : {"a" : {"b" : {"c" : 2, "d" : [2, 3]}}, "b" : [1, 2, 3], "c" : null, "d" : {"e" : {"g" : 43}, "i" : [4, 5, 6]}}}
{"json" : {"a" : {"b" : {"c" : 3, "d" : [4, 5]}}, "b" : {"c" : 10}, "e" : "Hello, World!"}}
{"json" : {"a" : {"b" : {"c" : 4, "d" : [6, 7]}}, "c" : 43}}
{"json" : {"a" : {"b" : {"c" : 5, "d" : [8, 9]}}, "b" : {"c" : 11, "j" : [1, 2, 3]}, "d" : {"e" : {"f" : ["s3", "s4"], "g" : 44}, "h" : "2020-02-02 10:00:00"}}}
')
```

```text title="Response" theme={null}
┌─json──────────────────────────────────────────────────────────┐
│ {"a":{"b":{"c":1}},"c":"42","d":{"i":["1","2","3"]}}          │
│ {"a":{"b":{"c":2}},"d":{"i":["4","5","6"]}}                   │
│ {"a":{"b":{"c":3}},"e":"Hello, World!"}                       │
│ {"a":{"b":{"c":4}},"c":"43"}                                  │
│ {"a":{"b":{"c":5}},"d":{"h":"2020-02-02 10:00:00.000000000"}} │
└───────────────────────────────────────────────────────────────┘
```

For text formats like `CSV`/`TSV`/etc, `JSON` is parsed from a string containing the JSON object:

```sql title="Query" theme={null}
SELECT json FROM format(TSV, 'json JSON(a.b.c UInt32, SKIP a.b.d, SKIP REGEXP \'b.*\')',
'{"a" : {"b" : {"c" : 1, "d" : [0, 1]}}, "b" : "2020-01-01", "c" : 42, "d" : {"e" : {"f" : ["s1", "s2"]}, "i" : [1, 2, 3]}}
{"a" : {"b" : {"c" : 2, "d" : [2, 3]}}, "b" : [1, 2, 3], "c" : null, "d" : {"e" : {"g" : 43}, "i" : [4, 5, 6]}}
{"a" : {"b" : {"c" : 3, "d" : [4, 5]}}, "b" : {"c" : 10}, "e" : "Hello, World!"}
{"a" : {"b" : {"c" : 4, "d" : [6, 7]}}, "c" : 43}
{"a" : {"b" : {"c" : 5, "d" : [8, 9]}}, "b" : {"c" : 11, "j" : [1, 2, 3]}, "d" : {"e" : {"f" : ["s3", "s4"], "g" : 44}, "h" : "2020-02-02 10:00:00"}}')
```

```text title="Response" theme={null}
┌─json──────────────────────────────────────────────────────────┐
│ {"a":{"b":{"c":1}},"c":"42","d":{"i":["1","2","3"]}}          │
│ {"a":{"b":{"c":2}},"d":{"i":["4","5","6"]}}                   │
│ {"a":{"b":{"c":3}},"e":"Hello, World!"}                       │
│ {"a":{"b":{"c":4}},"c":"43"}                                  │
│ {"a":{"b":{"c":5}},"d":{"h":"2020-02-02 10:00:00.000000000"}} │
└───────────────────────────────────────────────────────────────┘
```

<h2 id="reaching-the-limit-of-dynamic-paths-inside-json">
  Reaching the limit of dynamic paths inside JSON
</h2>

The `JSON` data type can store only a limited number of paths as separate sub-columns internally.
By default, this limit is `1024`, but you can change it in the type declaration using parameter `max_dynamic_paths`.

When the limit is reached, all new paths inserted to a `JSON` column will be stored in a single shared data structure.
It's still possible to read such paths as sub-columns,
but it might be less efficient ([see section about shared data](#shared-data-structure)).
This limit is needed to avoid having an enormous number of different sub-columns that can make the table unusable.

Let's see what happens when the limit is reached in a few different scenarios.

<h3 id="reaching-the-limit-during-data-parsing">
  Reaching the limit during data parsing
</h3>

During parsing of `JSON` objects from data, when the limit is reached for the current block of data,
all new paths will be stored in a shared data structure. We can use the following two introspection functions `JSONDynamicPaths`, `JSONSharedDataPaths`:

```sql title="Query" theme={null}
SELECT json, JSONDynamicPaths(json), JSONSharedDataPaths(json) FROM format(JSONEachRow, 'json JSON(max_dynamic_paths=3)', '
{"json" : {"a" : {"b" : 42}, "c" : [1, 2, 3]}}
{"json" : {"a" : {"b" : 43}, "d" : "2020-01-01"}}
{"json" : {"a" : {"b" : 44}, "c" : [4, 5, 6]}}
{"json" : {"a" : {"b" : 43}, "d" : "2020-01-02", "e" : "Hello", "f" : {"g" : 42.42}}}
{"json" : {"a" : {"b" : 43}, "c" : [7, 8, 9], "f" : {"g" : 43.43}, "h" : "World"}}
')
```

```text title="Response" theme={null}
┌─json───────────────────────────────────────────────────────────┬─JSONDynamicPaths(json)─┬─JSONSharedDataPaths(json)─┐
│ {"a":{"b":"42"},"c":["1","2","3"]}                             │ ['a.b','c','d']        │ []                        │
│ {"a":{"b":"43"},"d":"2020-01-01"}                              │ ['a.b','c','d']        │ []                        │
│ {"a":{"b":"44"},"c":["4","5","6"]}                             │ ['a.b','c','d']        │ []                        │
│ {"a":{"b":"43"},"d":"2020-01-02","e":"Hello","f":{"g":42.42}}  │ ['a.b','c','d']        │ ['e','f.g']               │
│ {"a":{"b":"43"},"c":["7","8","9"],"f":{"g":43.43},"h":"World"} │ ['a.b','c','d']        │ ['f.g','h']               │
└────────────────────────────────────────────────────────────────┴────────────────────────┴───────────────────────────┘
```

As we can see, after inserting paths `e` and `f.g` the limit was reached,
and they got inserted into a shared data structure.

<h3 id="during-merges-of-data-parts-in-mergetree-table-engines">
  During merges of data parts in MergeTree table engines
</h3>

During a merge of several data parts in a `MergeTree` table the `JSON` column in the resulting data part can reach the limit of dynamic paths
and won't be able to store all paths from source parts as sub-columns.
In this case, ClickHouse chooses what paths will remain as sub-columns after merge and what paths will be stored in the shared data structure.
In most cases, ClickHouse tries to keep paths that contain
the largest number of non-null values and move the rarest paths to the shared data structure. This does, however, depend on the implementation.

Let's see an example of such a merge.
First, let's create a table with a `JSON` column, set the limit of dynamic paths to `3` and then insert values with `5` different paths:

```sql title="Query" theme={null}
CREATE TABLE test (id UInt64, json JSON(max_dynamic_paths=3)) ENGINE=MergeTree ORDER BY id;
SYSTEM STOP MERGES test;
INSERT INTO test SELECT number, formatRow('JSONEachRow', number as a) FROM numbers(5);
INSERT INTO test SELECT number, formatRow('JSONEachRow', number as b) FROM numbers(4);
INSERT INTO test SELECT number, formatRow('JSONEachRow', number as c) FROM numbers(3);
INSERT INTO test SELECT number, formatRow('JSONEachRow', number as d) FROM numbers(2);
INSERT INTO test SELECT number, formatRow('JSONEachRow', number as e) FROM numbers(1);
```

Each insert will create a separate data part with the `JSON` column containing a single path:

```sql title="Query" theme={null}
SELECT
    count(),
    groupArrayArrayDistinct(JSONDynamicPaths(json)) AS dynamic_paths,
    groupArrayArrayDistinct(JSONSharedDataPaths(json)) AS shared_data_paths,
    _part
FROM test
GROUP BY _part
ORDER BY _part ASC
```

```text title="Response" theme={null}
┌─count()─┬─dynamic_paths─┬─shared_data_paths─┬─_part─────┐
│       5 │ ['a']         │ []                │ all_1_1_0 │
│       4 │ ['b']         │ []                │ all_2_2_0 │
│       3 │ ['c']         │ []                │ all_3_3_0 │
│       2 │ ['d']         │ []                │ all_4_4_0 │
│       1 │ ['e']         │ []                │ all_5_5_0 │
└─────────┴───────────────┴───────────────────┴───────────┘
```

Now, let's merge all parts into one and see what will happen:

```sql title="Query" theme={null}
SELECT
    count(),
    groupArrayArrayDistinct(JSONDynamicPaths(json)) AS dynamic_paths,
    groupArrayArrayDistinct(JSONSharedDataPaths(json)) AS shared_data_paths,
    _part
FROM test
GROUP BY _part
ORDER BY _part ASC
```

```text title="Response" theme={null}
┌─count()─┬─dynamic_paths─┬─shared_data_paths─┬─_part─────┐
│      15 │ ['a','b','c'] │ ['d','e']         │ all_1_5_2 │
└─────────┴───────────────┴───────────────────┴───────────┘
```

As we can see, ClickHouse kept the most frequent paths `a`, `b` and `c` and moved paths `d` and `e` to a shared data structure.

<h2 id="shared-data-structure">
  Shared data structure
</h2>

As was described in the previous section, when the `max_dynamic_paths` limit is reached all new paths are stored in a single shared data structure.
In this section we will look into the details of the shared data structure and how we read paths sub-columns from it.

See section ["introspection functions"](/reference/data-types/newjson#introspection-functions) for details of functions used for inspecting the contents of a JSON column.

<h3 id="shared-data-structure-in-memory">
  Shared data structure in memory
</h3>

In memory, shared data structure is just a sub-column with type `Map(String, String)` that stores mapping from a flattened JSON path to a binary encoded value.
To extract a path subcolumn from it, we just iterate over all rows in this `Map` column and try to find the requested path and its values.

<h3 id="shared-data-structure-in-merge-tree-parts">
  Shared data structure in MergeTree parts
</h3>

In [MergeTree](/reference/engines/table-engines/mergetree-family/mergetree) tables we store data in data parts that stores everything on disk (local or remote). And data on disk can be stored in a different way compared to memory.
Currently, there are 3 different shared data structure serializations in MergeTree data parts: `map`, `map_with_buckets`
and `advanced`.

The serialization version is controlled by MergeTree
settings [object\_shared\_data\_serialization\_version](/reference/settings/merge-tree-settings#object_shared_data_serialization_version)
and [object\_shared\_data\_serialization\_version\_for\_zero\_level\_parts](/reference/settings/merge-tree-settings#object_shared_data_serialization_version_for_zero_level_parts)
(zero level part is the part created during inserting data into the table, during merges parts have higher level).

Note: changing shared data structure serialization is supported only
for `v3` [object serialization version](/reference/settings/merge-tree-settings#object_serialization_version)

<h4 id="shared-data-map">
  Map
</h4>

In `map` serialization version shared data is serialized as a single column with type `Map(String, String)` the same as it's stored in
memory. To read path sub-column from this type of serialization ClickHouse reads the whole `Map` column and
extracts the requested path in memory.

This serialization is efficient for writing data and reading the whole `JSON` column, but it's not efficient for reading paths sub-columns.

<h4 id="shared-data-map-with-buckets">
  Map with buckets
</h4>

In `map_with_buckets` serialization version shared data is serialized as `N` columns ("buckets") with type `Map(String, String)`.
Each such bucket contains only subset of paths. To read path sub-column from this type of serialization ClickHouse
reads the whole `Map` column from a single bucket and extracts the requested path in memory.

This serialization is less efficient for writing data and reading the whole `JSON` column, but it's more efficient for reading paths sub-columns
because it reads data only from required buckets.

Number of buckets `N` is controlled by MergeTree settings [object\_shared\_data\_buckets\_for\_compact\_part](/reference/settings/merge-tree-settings#object_shared_data_buckets_for_compact_part) (8 by default)
and [object\_shared\_data\_buckets\_for\_wide\_part](/reference/settings/merge-tree-settings#object_shared_data_buckets_for_wide_part) (32 by default).
The maximum allowed value for both settings is 256.

<h4 id="shared-data-advanced">
  Advanced
</h4>

In `advanced` serialization version shared data is serialized in a special data structure that maximizes the performance
of paths sub-columns reading by storing some additional information that allows to read only the data of requested paths.
This serialization also supports buckets, so each bucket contains only sub-set of paths.

This serialization is quite inefficient for writing data (so it's not recommended to use this serialization for zero-level parts), reading the whole `JSON` column is slightly less efficient compared to `map` serialization, but it's very efficient for reading paths sub-columns.

Note: because of storing some additional information inside the data structure, the disk storage size is higher with this serialization compared to
`map` and `map_with_buckets` serializations.

For more detailed overview of the new shared data serializations and implementation details read the [blog post](https://clickhouse.com/blog/json-data-type-gets-even-better).

<h2 id="controlling-the-number-of-dynamic-paths">
  Controlling the number of dynamic paths inside JSON in MergeTree parts
</h2>

The main way to set a limit on dynamic paths in JSON is to use `max_dynamic_paths` parameter inside the JSON type declaration.
But changing `max_dynamic_paths` for existing columns requires running `ALTER TABLE <table> MODIFY COLUMN <column> JSON(max_dynamic_paths=K)` that will start a background mutation that will rewrite all existing parts.
Such mutation can be really heavy and can affect the server performance until the mutation is finished. To avoid this, you can use these 3 settings that can help you to change the limit on dynamic paths in MergeTree tables for new data parts:

* `merge_max_dynamic_subcolumns_in_wide_part` - a MergeTree setting that limits the number of dynamic subcolumns for each JSON column during merge into a Wide data part.
* `merge_max_dynamic_subcolumns_in_compact_part` - a MergeTree setting that limits the number of dynamic subcolumns for each JSON column during merge into a Compact data part.
* `max_dynamic_subcolumns_in_json_type_parsing` - a session setting that limits the number of dynamic subcolumns for each JSON column during parsing of JSON data into a JSON column.

Note: limit on dynamic paths cannot exceed the value specified in `max_dynamic_paths` parameter, even if values of described settings are higher.

<h2 id="introspection-functions">
  Introspection functions
</h2>

There are several functions that can help to inspect the content of the JSON column:

* [`JSONAllPaths`](/reference/functions/regular-functions/json-functions#JSONAllPaths)
* [`JSONAllPathsWithTypes`](/reference/functions/regular-functions/json-functions#JSONAllPathsWithTypes)
* [`JSONAllValues`](/reference/functions/regular-functions/json-functions#JSONAllValues)
* [`JSONDynamicPaths`](/reference/functions/regular-functions/json-functions#JSONDynamicPaths)
* [`JSONDynamicPathsWithTypes`](/reference/functions/regular-functions/json-functions#JSONDynamicPathsWithTypes)
* [`JSONSharedDataPaths`](/reference/functions/regular-functions/json-functions#JSONSharedDataPaths)
* [`JSONSharedDataPathsWithTypes`](/reference/functions/regular-functions/json-functions#JSONSharedDataPathsWithTypes)
* [`distinctDynamicTypes`](/reference/functions/aggregate-functions/distinctDynamicTypes)
* [`distinctJSONPaths and distinctJSONPathsAndTypes`](/reference/functions/aggregate-functions/distinctJSONPaths)

**Examples**

Let's investigate the content of the [GH Archive](https://www.gharchive.org/) dataset for the date `2020-01-01`:

```sql title="Query" theme={null}
SELECT arrayJoin(distinctJSONPaths(json))
FROM s3('s3://clickhouse-public-datasets/gharchive/original/2020-01-01-*.json.gz', JSONAsObject)
```

```text title="Response" theme={null}
┌─arrayJoin(distinctJSONPaths(json))─────────────────────────┐
│ actor.avatar_url                                           │
│ actor.display_login                                        │
│ actor.gravatar_id                                          │
│ actor.id                                                   │
│ actor.login                                                │
│ actor.url                                                  │
│ created_at                                                 │
│ id                                                         │
│ org.avatar_url                                             │
│ org.gravatar_id                                            │
│ org.id                                                     │
│ org.login                                                  │
│ org.url                                                    │
│ payload.action                                             │
│ payload.before                                             │
│ payload.comment._links.html.href                           │
│ payload.comment._links.pull_request.href                   │
│ payload.comment._links.self.href                           │
│ payload.comment.author_association                         │
│ payload.comment.body                                       │
│ payload.comment.commit_id                                  │
│ payload.comment.created_at                                 │
│ payload.comment.diff_hunk                                  │
│ payload.comment.html_url                                   │
│ payload.comment.id                                         │
│ payload.comment.in_reply_to_id                             │
│ payload.comment.issue_url                                  │
│ payload.comment.line                                       │
│ payload.comment.node_id                                    │
│ payload.comment.original_commit_id                         │
│ payload.comment.original_position                          │
│ payload.comment.path                                       │
│ payload.comment.position                                   │
│ payload.comment.pull_request_review_id                     │
...
│ payload.release.node_id                                    │
│ payload.release.prerelease                                 │
│ payload.release.published_at                               │
│ payload.release.tag_name                                   │
│ payload.release.tarball_url                                │
│ payload.release.target_commitish                           │
│ payload.release.upload_url                                 │
│ payload.release.url                                        │
│ payload.release.zipball_url                                │
│ payload.size                                               │
│ public                                                     │
│ repo.id                                                    │
│ repo.name                                                  │
│ repo.url                                                   │
│ type                                                       │
└─arrayJoin(distinctJSONPaths(json))─────────────────────────┘
```

```sql title="Query" theme={null}
SELECT arrayJoin(distinctJSONPathsAndTypes(json))
FROM s3('s3://clickhouse-public-datasets/gharchive/original/2020-01-01-*.json.gz', JSONAsObject)
SETTINGS date_time_input_format = 'best_effort'
```

```text title="Response" theme={null}
┌─arrayJoin(distinctJSONPathsAndTypes(json))──────────────────┐
│ ('actor.avatar_url',['String'])                             │
│ ('actor.display_login',['String'])                          │
│ ('actor.gravatar_id',['String'])                            │
│ ('actor.id',['Int64'])                                      │
│ ('actor.login',['String'])                                  │
│ ('actor.url',['String'])                                    │
│ ('created_at',['DateTime'])                                 │
│ ('id',['String'])                                           │
│ ('org.avatar_url',['String'])                               │
│ ('org.gravatar_id',['String'])                              │
│ ('org.id',['Int64'])                                        │
│ ('org.login',['String'])                                    │
│ ('org.url',['String'])                                      │
│ ('payload.action',['String'])                               │
│ ('payload.before',['String'])                               │
│ ('payload.comment._links.html.href',['String'])             │
│ ('payload.comment._links.pull_request.href',['String'])     │
│ ('payload.comment._links.self.href',['String'])             │
│ ('payload.comment.author_association',['String'])           │
│ ('payload.comment.body',['String'])                         │
│ ('payload.comment.commit_id',['String'])                    │
│ ('payload.comment.created_at',['DateTime'])                 │
│ ('payload.comment.diff_hunk',['String'])                    │
│ ('payload.comment.html_url',['String'])                     │
│ ('payload.comment.id',['Int64'])                            │
│ ('payload.comment.in_reply_to_id',['Int64'])                │
│ ('payload.comment.issue_url',['String'])                    │
│ ('payload.comment.line',['Int64'])                          │
│ ('payload.comment.node_id',['String'])                      │
│ ('payload.comment.original_commit_id',['String'])           │
│ ('payload.comment.original_position',['Int64'])             │
│ ('payload.comment.path',['String'])                         │
│ ('payload.comment.position',['Int64'])                      │
│ ('payload.comment.pull_request_review_id',['Int64'])        │
...
│ ('payload.release.node_id',['String'])                      │
│ ('payload.release.prerelease',['Bool'])                     │
│ ('payload.release.published_at',['DateTime'])               │
│ ('payload.release.tag_name',['String'])                     │
│ ('payload.release.tarball_url',['String'])                  │
│ ('payload.release.target_commitish',['String'])             │
│ ('payload.release.upload_url',['String'])                   │
│ ('payload.release.url',['String'])                          │
│ ('payload.release.zipball_url',['String'])                  │
│ ('payload.size',['Int64'])                                  │
│ ('public',['Bool'])                                         │
│ ('repo.id',['Int64'])                                       │
│ ('repo.name',['String'])                                    │
│ ('repo.url',['String'])                                     │
│ ('type',['String'])                                         │
└─arrayJoin(distinctJSONPathsAndTypes(json))──────────────────┘
```

<h2 id="alter-modify-column-to-json-type">
  ALTER MODIFY COLUMN to JSON type
</h2>

It's possible to alter an existing table and change the type of the column to the new `JSON` type. Right now only `ALTER` from a `String` type is supported.

**Example**

```sql title="Query" theme={null}
CREATE TABLE test (json String) ENGINE=MergeTree ORDER BY tuple();
INSERT INTO test VALUES ('{"a" : 42}'), ('{"a" : 43, "b" : "Hello"}'), ('{"a" : 44, "b" : [1, 2, 3]}'), ('{"c" : "2020-01-01"}');
ALTER TABLE test MODIFY COLUMN json JSON;
SELECT json, json.a, json.b, json.c FROM test;
```

```text title="Response" theme={null}
┌─json─────────────────────────┬─json.a─┬─json.b──┬─json.c─────┐
│ {"a":"42"}                   │ 42     │ ᴺᵁᴸᴸ    │ ᴺᵁᴸᴸ       │
│ {"a":"43","b":"Hello"}       │ 43     │ Hello   │ ᴺᵁᴸᴸ       │
│ {"a":"44","b":["1","2","3"]} │ 44     │ [1,2,3] │ ᴺᵁᴸᴸ       │
│ {"c":"2020-01-01"}           │ ᴺᵁᴸᴸ   │ ᴺᵁᴸᴸ    │ 2020-01-01 │
└──────────────────────────────┴────────┴─────────┴────────────┘
```

<h2 id="lazy-type-hints">
  Lazy Type Hints (Experimental)
</h2>

<Note>
  This feature is experimental and requires the setting `allow_experimental_json_lazy_type_hints` to be enabled.
</Note>

When you add or modify type hints on a JSON column using `ALTER TABLE ... MODIFY COLUMN`, ClickHouse normally rewrites all data parts to materialize the new type hints. For tables with large amounts of historical data (hundreds of terabytes), this can be extremely expensive.

**Lazy type hints** allow adding type hints as a metadata-only operation without rewriting existing data:

* **Old parts**: Type hints are applied at query time by casting from `Dynamic` to the hinted type
* **New parts**: Type hints are materialized during `INSERT` operations
* **Merges**: Type hints are materialized when parts are merged

This means you can add type hints instantly, and the data will be gradually converted as normal background merges occur.

<h3 id="enabling-lazy-type-hints">
  Enabling Lazy Type Hints
</h3>

```sql theme={null}
SET allow_experimental_json_lazy_type_hints = 1;
```

<h3 id="lazy-type-hints-example">
  Example
</h3>

```sql title="Query" theme={null}
-- Create a table and insert data
CREATE TABLE test_lazy (json JSON) ENGINE = MergeTree ORDER BY tuple();
INSERT INTO test_lazy VALUES ('{"user_id": "123", "score": "95.5"}');

-- Enable experimental setting
SET allow_experimental_json_lazy_type_hints = 1;

-- Add type hints - this completes instantly without mutation
ALTER TABLE test_lazy MODIFY COLUMN json JSON(user_id UInt64, score Float64);

-- Query the data - type hints are applied at read time
SELECT json.user_id, toTypeName(json.user_id), json.score, toTypeName(json.score) FROM test_lazy;
```

```text title="Response" theme={null}
┌─json.user_id─┬─toTypeName(json.user_id)─┬─json.score─┬─toTypeName(json.score)─┐
│          123 │ UInt64                   │       95.5 │ Float64                │
└──────────────┴──────────────────────────┴────────────┴────────────────────────┘
```

<h3 id="verifying-no-mutation-occurred">
  Verifying No Mutation Occurred
</h3>

You can verify that the `ALTER` completed without a mutation by checking the `system.mutations` table:

```sql theme={null}
SELECT * FROM system.mutations WHERE table = 'test_lazy' AND NOT is_done;
```

With lazy type hints enabled, this query returns no rows, confirming the operation was metadata-only.

<h3 id="materializing-type-hints">
  Materializing Type Hints
</h3>

To materialize type hints in existing data, you can either:

1. **Wait for background merges**: ClickHouse will automatically materialize type hints when parts are merged
2. **Force merge**: Use `OPTIMIZE TABLE test_lazy FINAL` to merge all parts immediately
3. **Rewrite parts**: Use `ALTER TABLE test_lazy REWRITE PARTS` to rewrite parts with the new metadata

<h3 id="lazy-type-hints-limitations">
  Limitations
</h3>

* This feature is experimental and may change in future versions
* Query-time type conversion can have significant performance overhead compared to pre-materialized types, especially for large JSON objects
* The feature only applies when modifying `typed_paths` (type hints); other JSON parameters like `max_dynamic_paths`, `SKIP`, or `SKIP REGEXP` still require mutations

<h2 id="comparison-between-values-of-the-json-type">
  Comparison between values of the JSON type
</h2>

JSON objects are compared similarly to Maps.

For example:

```sql title="Query" theme={null}
CREATE TABLE test (json1 JSON, json2 JSON) ENGINE=Memory;
INSERT INTO test FORMAT JSONEachRow
{"json1" : {}, "json2" : {}}
{"json1" : {"a" : 42}, "json2" : {}}
{"json1" : {"a" : 42}, "json2" : {"a" : 41}}
{"json1" : {"a" : 42}, "json2" : {"a" : 42}}
{"json1" : {"a" : 42}, "json2" : {"a" : [1, 2, 3]}}
{"json1" : {"a" : 42}, "json2" : {"a" : "Hello"}}
{"json1" : {"a" : 42}, "json2" : {"b" : 42}}
{"json1" : {"a" : 42}, "json2" : {"a" : 42, "b" : 42}}
{"json1" : {"a" : 42}, "json2" : {"a" : 41, "b" : 42}}

SELECT json1, json2, json1 < json2, json1 = json2, json1 > json2 FROM test;
```

```text title="Response" theme={null}
┌─json1──────┬─json2───────────────┬─less(json1, json2)─┬─equals(json1, json2)─┬─greater(json1, json2)─┐
│ {}         │ {}                  │                  0 │                    1 │                     0 │
│ {"a":"42"} │ {}                  │                  0 │                    0 │                     1 │
│ {"a":"42"} │ {"a":"41"}          │                  0 │                    0 │                     1 │
│ {"a":"42"} │ {"a":"42"}          │                  0 │                    1 │                     0 │
│ {"a":"42"} │ {"a":["1","2","3"]} │                  0 │                    0 │                     1 │
│ {"a":"42"} │ {"a":"Hello"}       │                  1 │                    0 │                     0 │
│ {"a":"42"} │ {"b":"42"}          │                  1 │                    0 │                     0 │
│ {"a":"42"} │ {"a":"42","b":"42"} │                  1 │                    0 │                     0 │
│ {"a":"42"} │ {"a":"41","b":"42"} │                  0 │                    0 │                     1 │
└────────────┴─────────────────────┴────────────────────┴──────────────────────┴───────────────────────┘
```

**Note:** when 2 paths contain values of different data types, they are compared according to [comparison rule](/reference/data-types/variant#comparing-values-of-variant-data) of `Variant` data type.

<h2 id="data-skipping-indexes-for-json">
  Data skipping indexes for JSON
</h2>

[Data skipping indexes](/reference/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-data_skipping-indexes) can be used with `JSON` columns in three ways:

1. **Indexes on specific subcolumns** — create a standard skip index on a known JSON path, just like on a regular column. This indexes the *values* at that path.
2. **Path-based indexes with `JSONAllPaths`** — index the *set of paths* present in each granule to skip granules that cannot contain the queried path.
3. **Value-based indexes with `JSONAllValues`** — index *all values* across all JSON paths using a [text index](/reference/engines/table-engines/mergetree-family/textindexes) to accelerate full-text search on any JSON subcolumn with a single index.

<h3 id="json-indexes-on-subcolumns">
  Indexes on specific subcolumns
</h3>

You can create a skip index on any JSON subcolumn using the same syntax as for regular columns.
Any [supported index type](/reference/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-data_skipping-indexes) works (`minmax`, `set`, `bloom_filter`, `tokenbf_v1`, `ngrambf_v1`, etc.).

There are two ways to reference a JSON subcolumn in an index expression:

* **Typed path** declared in the JSON type hint — access by name directly: `json.a`.
* **Dynamic path** with explicit cast — use the `::` cast syntax: `json.b::String`.

You can also use expressions that combine multiple subcolumns, for example `json.a || json.b::String`.

<h4 id="json-indexes-on-subcolumns-example">
  Example
</h4>

```sql title="Query" theme={null}
CREATE TABLE sensor_data
(
    data JSON(sensor_id UInt32),
    INDEX idx_sensor data.sensor_id TYPE minmax GRANULARITY 1,
    INDEX idx_location data.location::String TYPE bloom_filter GRANULARITY 1
)
ENGINE = MergeTree
ORDER BY tuple()
SETTINGS index_granularity = 1;

INSERT INTO sensor_data SELECT toJSONString(map('sensor_id', number, 'location', 'room_' || toString(number))) FROM numbers(4);
INSERT INTO sensor_data SELECT toJSONString(map('sensor_id', number, 'location', 'room_' || toString(number))) FROM numbers(4, 4);
```

The `minmax` index on the typed subcolumn `data.sensor_id` narrows the scan to matching granules:

```sql title="Query" theme={null}
EXPLAIN indexes = 1 SELECT * FROM sensor_data WHERE data.sensor_id < 2;
```

```text title="Response" theme={null}
...
    Indexes:
      Skip
        Name: idx_sensor
        Description: minmax GRANULARITY 1
        Parts: 1/2
        Granules: 2/8
```

The `bloom_filter` index on the cast subcolumn `data.location::String` also works:

```sql title="Query" theme={null}
EXPLAIN indexes = 1 SELECT * FROM sensor_data WHERE data.location::String = 'room_5';
```

```text title="Response" theme={null}
...
    Indexes:
      Skip
        Name: idx_location
        Description: bloom_filter GRANULARITY 1
        Parts: 1/2
        Granules: 1/8
```

<h3 id="json-indexes-jsonallpaths">
  Path-based indexes with JSONAllPaths
</h3>

[Data skipping indexes](/reference/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-data_skipping-indexes) can also be created on `JSON` columns using the [`JSONAllPaths`](/reference/functions/regular-functions/json-functions#JSONAllPaths) function.
This works similarly to creating skip indexes on [`Map`](/reference/data-types/map) columns via `mapKeys` — the index stores the set of JSON paths present in each granule and uses it to skip granules that cannot contain the queried path.

<h4 id="json-indexes-jsonallpaths-supported-types">
  Supported index types
</h4>

`JSONAllPaths` can be used with the following skip index types:

* [`bloom_filter`](/reference/engines/table-engines/mergetree-family/mergetree#bloom-filter) — supports `equals`, `in`, and `IS NOT NULL`.
* [`tokenbf_v1`](/reference/engines/table-engines/mergetree-family/mergetree#token-bloom-filter) — supports `equals` and `IS NOT NULL`.
* [`ngrambf_v1`](/reference/engines/table-engines/mergetree-family/mergetree#n-gram-bloom-filter) — supports `equals` and `IS NOT NULL`.
* [`text`](/reference/engines/table-engines/mergetree-family/textindexes) (inverted index) — supports `equals`, `in` and `IS NOT NULL`.

<h4 id="json-indexes-jsonallpaths-example">
  Example
</h4>

```sql title="Query" theme={null}
CREATE TABLE events
(
    data JSON,
    INDEX idx JSONAllPaths(data) TYPE bloom_filter GRANULARITY 1
)
ENGINE = MergeTree
ORDER BY tuple();

INSERT INTO events VALUES ('{"user": {"name": "Alice"}, "action": "login"}');
INSERT INTO events VALUES ('{"metric": {"cpu": 0.95}, "host": "srv1"}');
```

You can use `EXPLAIN indexes = 1` to verify that the skip index is being used. When a path exists only in one part, the index skips the other part:

```sql title="Query" theme={null}
EXPLAIN indexes = 1 SELECT * FROM events WHERE data.user.name = 'Alice';
```

```text title="Response" theme={null}
...
    Indexes:
      Skip
        Name: idx
        Description: bloom_filter GRANULARITY 1
        Parts: 1/2
        Granules: 1/2
```

When a path does not exist in any part, all parts and granules are skipped:

```sql title="Query" theme={null}
EXPLAIN indexes = 1 SELECT * FROM events WHERE data.nonexistent = 1;
```

```text title="Response" theme={null}
...
    Indexes:
      Skip
        Name: idx
        Description: bloom_filter GRANULARITY 1
        Parts: 0/2
        Granules: 0/2
```

`IS NOT NULL` also uses the index — it skips granules where the path is absent (since the value would be `NULL`):

```sql title="Query" theme={null}
EXPLAIN indexes = 1 SELECT * FROM events WHERE data.user.name IS NOT NULL;
```

```text title="Response" theme={null}
...
    Indexes:
      Skip
        Name: idx
        Description: bloom_filter GRANULARITY 1
        Parts: 1/2
        Granules: 1/2
```

<h4 id="json-indexes-jsonallpaths-how-it-works">
  How it works
</h4>

The `JSONAllPaths(json_column)` expression produces an `Array(String)` containing all paths present in a JSON value.
The skip index stores these path strings in its data structure (bloom filter or inverted index).
When a query filters on `json.some.path`, the index checks whether the string `"some.path"` is present in the index for each granule and skips granules where it is absent.

<h4 id="json-indexes-jsonallpaths-safety-with-missing-paths">
  Safety with missing paths
</h4>

When a JSON path is absent from a granule, the subcolumn evaluates to:

* `NULL` for `Dynamic` type (e.g., `json.path`) and `Nullable` typed subcolumns (e.g., `json.path.:Int64`) — comparisons with `NULL` always return false, so skipping is safe.
* The type's default value for non-`Nullable` CAST expressions (e.g., `json.path::Int64` produces `0` when the path is missing) — skipping is safe only when the compared value differs from the default. The index automatically handles this distinction.

<h3 id="json-indexes-jsonallvalues">
  Full-text search with JSONAllValues
</h3>

[Text indexes](/reference/engines/table-engines/mergetree-family/textindexes) can be used to accelerate full-text search on JSON columns via the [`JSONAllValues`](/reference/functions/regular-functions/json-functions#JSONAllValues) function.
`JSONAllValues` returns all values from a JSON column as `Array(String)`, which can be indexed by a text index.
A single index on `JSONAllValues(json_column)` covers all JSON paths, enabling full-text search on any subcolumn without creating separate indexes for each path.

See [Value-based indexes with JSONAllValues](/reference/engines/table-engines/mergetree-family/textindexes#json-indexes-jsonallvalues) in the text indexes documentation for details and examples.

<h2 id="tips-for-better-usage-of-the-json-type">
  Tips for better usage of the JSON type
</h2>

Before creating `JSON` column and loading data into it, consider the following tips:

* Investigate your data and specify as many path hints with types as you can. It will make storage and reading much more efficient.
* Think about what paths you will need and what paths you will never need. Specify paths that you won't need in the `SKIP` section, and `SKIP REGEXP` section if needed. This will improve the storage.
* Don't set the `max_dynamic_paths` parameter to very high values, as it can make storage and reading less efficient.
  While highly dependent on system parameters such as memory, CPU, etc., a general rule of thumb would be to not set `max_dynamic_paths` greater than 10 000 for the local filesystem storage and 1024 for the remote filesystem storage.

<h2 id="further-reading">
  Further Reading
</h2>

* [How we built a new powerful JSON data type for ClickHouse](https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse)
* [The billion docs JSON Challenge: ClickHouse vs. MongoDB, Elasticsearch, and more](https://clickhouse.com/blog/json-bench-clickhouse-vs-mongodb-elasticsearch-duckdb-postgresql)