Docs Menu
Docs Home
/
BI Connector
/

Document Relational Definition Language

On this page

  • File Format
  • Field Types
  • Embedded Documents
  • Custom Filters
  • Aggregation Pipelines

Document Relational Definition Language (DRDL) defines a relational view of a MongoDB schema.

mongodrdl samples documents from your MongoDB collections and derives a DRDL file from those documents. mongosqld then uses the schema defined in DRDL files to allow MySQL clients to query your MongoDB data.

The DRDL file lists your databases, tables, and columns in YAML format.

schema:
- db: <database name>
tables:
- table: <SQL table name>
collection: <MongoDB collection name>
pipeline:
- <optional pipeline elements>
columns:
- Name: <MongoDB field name>
MongoType: <MongoDB field type>
SqlName: <mapped SQL column name>
SqlType: <mapped SQL column type>

Given documents of the following shape in the collection abc in the database test:

{
"_id": ObjectId(),
"close": 7.45,
"detail": { "a": 2, "b": 3 }
}

Run mongodrdl to generate a schema based on this collection:

mongodrdl -d test -c abc -o schema.drdl

The generated schema file (schema.drdl) looks similar to the following:

schema:
- db: test
tables:
- table: abc
collection: abc
pipeline: []
columns:
- Name: _id
MongoType: bson.ObjectId
SqlName: _id
SqlType: varchar
- Name: close
MongoType: float64
SqlName: close
SqlType: numeric
- Name: detail.a
MongoType: float64
SqlName: detail.a
SqlType: numeric
- Name: detail.b
MongoType: float64
SqlName: detail.b
SqlType: numeric

BI Connector maps fields that always contain the same data type into the relational model. Schema generation deals specially with the following cases:

Numeric

BI Connector uses the most precise numeric type that matches the sampled documents.

If a field in a collection always has the same data type, BI Connector uses that type.

If a field in a collection can contain either floating point values or integers, BI Connector uses type numeric.

Dates

BI Connector treats any field of type data_date as the SQL timestamp type.

Timestamps

BI Connector ignores any field of type data_timestamp.

ObjectID

BI Connector treats any field of type data_oid as the SQL type varchar.

UUID

BI Connector treats any field of type UUID as the SQL type varchar.

Geospatial

If a collection contains a geospatial index, BI Connector maps the indexed field into an array of numeric longitude-latitude coordinates.

See Geospatial Data for an example.

BI Connector does not recognize geospatial fields when reading from a view.

Heterogeneous Fields

If a field contains inconsistent types, BI Connector chooses the most frequently sampled type.

If a field can contain either a type or an array of that type, the generated schema always specifies that the field contains an array.

To learn more, see How do I skip data incompatible with my DRDL type definition?.

BI Connector maps embedded documents to simple fields that have a . separator character, making them appear similar to the way you would reference them using dot notation in a MongoDB query.

While Tableau properly quotes identifiers, within ad-hoc SQL expressions you must double-quote every identifier that contains . characters or mixed-case characters.

Consider the following document:

{
"_id": 1,
"familyName": "Partridge",
"hometown" : "Hollywood Hills",
"address" : { "street": "123 Main Street",
"city" : "Hollywood",
"state" : "CA",
"zip" : "90210" },
"members_since" : ISODate("2002-04-12T00:00:00Z")
}

Running mongodrdl on a collection containing this document results in the following fields in the generated schema:

_id

numeric

familyName

varchar

hometown

varchar

address.street

varchar

address.city

varchar

address.state

varchar

address.zip

varchar

members_since

timestamp

BI Connector exposes arrays to business intelligence tools using two collections: one without the array, and the other having one document per array element.

If you run mongodrdl on a collection named families which contains the following document:

{
"_id": 1,
"familyName": "Partridge",
"hometown" : "Hollywood Hills",
"familyMembers" : [
{
"firstname" : "Shirley",
"age" : 42,
"attributes" : [
{ "name" : "instrument", "value" : "singer" },
{ "name" : "role", "value" : "mom" }
]
},
{
"firstname" : "Keith",
"age" : 18,
"attributes" : [
{ "name" : "instrument", "value" : "guitar" },
{ "name" : "role", "value" : "son" }
]
},
{
"firstname" : "Laurie",
"age" : 16,
"attributes" : [
{ "name" : "instrument", "value" : "keyboard" },
{ "name" : "role", "value" : "sister" }
]
}]
}

This results in the following three tables:

families

_id

numeric

familyName

varchar

hometown

varchar

families_familyMembers

_id

numeric

familyMembers.age

numeric

familyMembers.firstname

varchar

familyMembers_idx

numeric

families_familyMembers_attributes

_id

numeric

familyMembers.attributes.name

varchar

familyMembers.attributes.value

varchar

familyMembers.attributes_idx

numeric

familyMembers_idx

numeric

You can join these tables together to view the data in a denormalized format. For example, you can list the people named in the above schema together with their family information using the following query:

SELECT f.*, m.`familyMembers.firstname`
FROM families_familyMembers m
JOIN families f
ON m._id = f._id;

If you provide the --preJoined option to mongodrdl, BI Connector adds the fields in the containing document to each array element's document, thus "pre-joining" the table.

In the previous example, the tables would contain the following additional columns:

families_familyMembers

familyName

varchar

hometown

varchar

families_familyMembers_attributes

familyMembers.age

numeric

familyMembers.firstname

varchar

familyMembers_idx

numeric

familyName

varchar

hometown

varchar

You can add a column of type mongo.Filter to a collection in your DRDL file. This column type allows you to perform a custom $match query.

For example, given the following schema describing a cloud of points with up to three components:

schema:
- db: test
tables:
- table: points
collection: points
pipeline: []
columns:
- Name: _id
MongoType: bson.ObjectId
SqlName: _id
SqlType: varchar
- Name: x
MongoType: float64
SqlName: x
SqlType: numeric
- Name: "y"
MongoType: float64
SqlName: "y"
SqlType: numeric
- Name: z
MongoType: float64
SqlName: z
SqlType: numeric
- Name: filter
MongoType: mongo.Filter
SqlName: filter
SqlType: varchar

You can select only three-dimensional points using the following query:

SELECT x, y, z
FROM points
WHERE filter='{"z": {"$exists": true}}';

MongoDB 3.4 introduces Read-Only Views that you can use to filter incompatible data.

For example, you can create a view in the test database that contains only documents containing a number in the grade field of a grades collection:

db.runCommand( { create: "numericGrades", viewOn: "grades", pipeline: [ { "$match": { "grade": { "$type": "number" } } } ] } )

You can then use mongodrdl to generate a schema from this view as you would a collection:

mongodrdl -d test -c numericGrades

BI Connector can use aggregation pipelines as part of the schema to transform documents from the collection into the proper form for the relational tables.

For example, consider a simple document in a collection named simpleFamilies:

{
"_id": 1,
"familyName": "Partridge",
"familyMembers" : [ "Shirley", "Keith", "Laurie"]
}

mongodrdl generates a schema with the tables simpleFamilies and simpleFamilies_familyMembers.

The table simpleFamilies_familyMembers enumerates each family member and has the following pipeline:

pipeline:
- $unwind:
includeArrayIndex: familyMembers_idx
path: $familyMembers

This pipeline uses $unwind to create a new record for each member of familyMembers. The schema tracks the array index in the field familyMembers_idx.

If a collection contains a 2d or 2dsphere geospatial index, BI Connector maps the indexed field into an array of numeric longitude-latitude coordinates.

Given the following collection:

db.points.createIndex( { pos : "2dsphere" } )
db.points.insertOne({
pos : { type: "Point", coordinates: [ -73.97, 40.77 ] },
name: "Central Park",
category : "Parks"
})

BI Connector generates the following schema:

schema:
- db: test
tables:
- table: points
collection: points
pipeline: []
columns:
- Name: _id
MongoType: bson.ObjectId
SqlName: _id
SqlType: varchar
- Name: category
MongoType: string
SqlName: category
SqlType: varchar
- Name: name
MongoType: string
SqlName: name
SqlType: varchar
- Name: pos.coordinates
MongoType: geo.2darray
SqlName: pos.coordinates
SqlType: numeric[]

BI Connector does not recognize geospatial fields when reading from a view.

Back

Configure TLS for BI Connector