Skip to main content
Version: Next

What is URN?

URN (Uniform Resource Name) is the chosen scheme of URI to uniquely define any resource in DataHub. It has the following form

urn:<Namespace>:<Entity Type>:<ID>

Onboarding a new entity to GMA starts with modelling an URN specific to that entity. You can use the existing URN models for built-in entities as a reference.

Namespace

All URNs available in DataHub are using li as their namespace. This can be easily changed to a different namespace for your organization if you fork DataHub.

Entity Type

Entity type for URN is different than entity in GMA context. This can be thought of as the object type of any resource for which you need unique identifier for its each instance. While you can create URNs for GMA entities such as [DatasetUrn] with entity type dataset, you can also define URN for data platforms, [DataPlatformUrn].

ID

ID is the unique identifier part of a URN. It's unique for a specific entity type within a specific namespace. ID could contain a single field, or multi fields in the case of complex URNs. A complex URN can even contain other URNs as ID fields. This type of URN is also referred to as nested URN. For non-URN ID fields, the value can be either a string, number, or Pegasus Enum.

Here are some example URNs with a single ID field:

urn:li:dataPlatform:kafka
urn:li:corpuser:jdoe

DatasetUrn is an example of a complex nested URN. It contains 3 ID fields: platform, name and fabric, where platform is another URN. Here are some examples

urn:li:dataset:(urn:li:dataPlatform:kafka,PageViewEvent,PROD)
urn:li:dataset:(urn:li:dataPlatform:hdfs,PageViewEvent,EI)

Restrictions

There are a few restrictions when creating an urn:

  1. Commas are reserved character in URN fields: ,
  2. Parentheses are reserved characters in URN fields: ( , )
  3. Colons are reserved characters in URN fields: :

Please do not use these characters when creating or generating urns. One approach is to use URL encoding for the characters.