Posted: July 25, 2025

Data Cloud SQL

Table of Contents:

Best Practices
Aggregating Complex Data
De-Duplicating Data
Dynamically Acting on Data Using Exists/Not Exists
Introduce Constants With Cross Join
Addressing Product Gaps
Be Careful When Comparing NULL In Case Statements
Know The Difference Between Different Unions
In Conclusion

If you had told somebody even last year that Structured Query Language (SQL) was going to be a make-or-break skill for advanced Data Cloud use-cases, I think many people would have laughed.

In this rapidly evolving world of development, however, Data Cloud’s strong tie-ins to SQL has quickly made SQL one of the best and easiest ways to:

Unify structured and unstructured data with disparate sources when answering user queries
Perform fine-tuned RAG using vector_search (and hybrid_search!)
Address product gaps that make using the SOQL-to-SQL converter impossible

Best Practices

Hopefully some of these save you time and energy:

Use the Query Editor app in Data Cloud. It (currently, at least) has some nuances that aren’t alway required when issuing queries via the API (like requiring column and table names to be double quoted), but it’s a great way to prototype queries
It’s possible to query data streams, but you probably shouldn’t — data models index your relationship fields and using the __dlm versions of tables can significantly improve the performance of queries (we’ve benchmarked the performance improvement at > 90% when join statements are involved)
Jonathan Gillespie used a simple prompt to come up with an object-oriented SqlQueryBuilder class. I would absolutely recommend trying that, or taking the time to incorporate an object like this into your workflow. It can really help to create a simpler API for common Data Cloud queries, like vector_search:

Expand to view full source for SqlQueryBuilder

@SuppressWarnings(
  'PMD.CognitiveComplexity, PMD.CyclomaticComplexity, PMD.ExcessivePublicCount'
)
public class SqlQueryBuilder {
  private static final String DELIMITER_OVERRIDE = ' ';

  @TestVisible
  private String indentationDelimiter = '\n\t';
  private String newLineCharacter = '\n';

  private final Map<String, SqlQueryBuilder> cteQueries = new Map<String, SqlQueryBuilder>();
  private final String tableName;
  private final String tableAlias;
  private final List<String> selectFields = new List<String>();
  private final List<String> whereClauses = new List<String>();
  private final List<String> joinClauses = new List<String>();
  private final List<String> groupByFields = new List<String>();
  private final List<String> havingClauses = new List<String>();
  private final List<String> orderByFields = new List<String>();
  private final List<SqlQueryBuilder> unionQueries = new List<SqlQueryBuilder>();

  private Integer limitCount;
  private Integer offsetCount;

  private enum JoinType {
    FULL_JOIN,
    INNER_JOIN,
    LEFT_JOIN,
    RIGHT_JOIN
  }

  public virtual class Filter {
    public final String field { get; protected set; }
    public final String operator { get; protected set; }
    public final Object value { get; protected set; }

    public Filter(String field, String operator, Object value) {
      this.field = field;
      this.operator = operator;
      this.value = value;
    }

    @SuppressWarnings('PMD.EmptyStatementBlock')
    protected Filter() {
    }

    public virtual override String toString() {
      return this.field + ' ' + this.operator + ' ' + this.value;
    }
  }

  public class OrFilter extends Filter {
    public final List<Filter> filters = new List<Filter>();

    public OrFilter(List<Filter> filters) {
      this.filters.addAll(filters);
    }

    public override String toString() {
      List<String> transformedFilters = new List<String>();
      for (Filter innerFilter : this.filters) {
        transformedFilters.add(innerFilter.toString());
      }
      return '(' + String.join(transformedFilters, ' OR ') + ')';
    }
  }

  public virtual class ExistsFilter extends Filter {
    private final SqlQueryBuilder otherQuery;

    public ExistsFilter(SqlQueryBuilder otherQuery) {
      super();
      this.otherQuery = otherQuery;
    }

    public virtual override String toString() {
      this.otherQuery.indentationDelimiter = DELIMITER_OVERRIDE;
      this.otherQuery.newLineCharacter = DELIMITER_OVERRIDE;
      return 'EXISTS (' + this.otherQuery.build() + ')';
    }
  }

  public class NotExistsFilter extends ExistsFilter {
    public NotExistsFilter(SqlQueryBuilder otherQuery) {
      super(otherQuery);
    }

    public override String toString() {
      return 'NOT ' + super.toString();
    }
  }

  public SqlQueryBuilder(String tableName, String tableAlias) {
    this.tableName = tableName;
    this.tableAlias = tableAlias;
  }

  public SqlQueryBuilder(String tableName) {
    this.tableName = tableName;
  }

  @SuppressWarnings('PMD.ExcessiveParameterList')
  public static SqlQueryBuilder vectorSearch(
    String tableName,
    String tableAlias,
    String searchString,
    Integer topKResults,
    String optionalPrefilteringColumn
  ) {
    List<String> commaSeparated = new List<String>{
      'TABLE(' +
      tableName +
      ')',
      escapeWithSingleQuotes(searchString),
      escapeWithSingleQuotes(optionalPrefilteringColumn ?? ''),
      '' + topKResults
    };
    return new SqlQueryBuilder(
      'vector_search(' + String.join(commaSeparated, ',') + ')',
      tableAlias
    );
  }

  public SqlQueryBuilder selectFields(Map<String, String> fieldToAlias) {
    for (String field : fieldToAlias.keySet()) {
      String fieldAlias = fieldToAlias.get(field);
      this.selectField(field, fieldAlias);
    }
    return this;
  }

  public SqlQueryBuilder selectFields(List<String> fields) {
    this.selectFields.addAll(fields);
    return this;
  }

  public SqlQueryBuilder selectField(String field) {
    selectFields.add(field);
    return this;
  }

  public SqlQueryBuilder selectField(String field, String fieldAlias) {
    String combinedFieldAlias = String.isBlank(fieldAlias)
      ? ''
      : ' AS ' + fieldAlias;
    return this.selectField(field + combinedFieldAlias);
  }

  public SqlQueryBuilder filterWhere(
    String field,
    String operator,
    Object value
  ) {
    return this.filterWhere(
      new Filter(field, operator, this.convertToSqlValue(value))
    );
  }

  public SqlQueryBuilder filterWhere(Filter filter) {
    this.whereClauses.add(filter.toString());
    return this;
  }

  public SqlQueryBuilder innerJoin(
    String tableName,
    String tableAlias,
    String joinOnCondition
  ) {
    return this.addJoinClause(
      JoinType.INNER_JOIN,
      tableName,
      tableAlias,
      joinOnCondition
    );
  }

  public SqlQueryBuilder leftJoin(
    String tableName,
    String tableAlias,
    String joinOnCondition
  ) {
    return this.addJoinClause(
      JoinType.LEFT_JOIN,
      tableName,
      tableAlias,
      joinOnCondition
    );
  }

  public SqlQueryBuilder rightJoin(
    String tableName,
    String tableAlias,
    String joinOnCondition
  ) {
    return this.addJoinClause(
      JoinType.RIGHT_JOIN,
      tableName,
      tableAlias,
      joinOnCondition
    );
  }

  public SqlQueryBuilder fullJoin(
    String tableName,
    String tableAlias,
    String joinOnCondition
  ) {
    return this.addJoinClause(
      JoinType.FULL_JOIN,
      tableName,
      tableAlias,
      joinOnCondition
    );
  }

  public SqlQueryBuilder groupBy(String field) {
    this.groupByFields.add(field);
    return this;
  }

  public SqlQueryBuilder havingAggregate(
    String aggregate,
    String operator,
    Object value
  ) {
    String sqlValue = this.convertToSqlValue(value);
    this.havingClauses.add(aggregate + ' ' + operator + ' ' + sqlValue);
    return this;
  }

  public SqlQueryBuilder orderByAscending(String field) {
    return this.addOrderByClause(field, 'ASC');
  }

  public SqlQueryBuilder orderByDescending(String field) {
    return this.addOrderByClause(field, 'DESC');
  }

  public SqlQueryBuilder offsetBy(Integer offsetCount) {
    this.offsetCount = offsetCount;
    return this;
  }

  public SqlQueryBuilder limitTo(Integer limitCount) {
    this.limitCount = limitCount;
    return this;
  }

  public SqlQueryBuilder union(SqlQueryBuilder otherQuery) {
    this.unionQueries.add(otherQuery);
    return this;
  }

  public SqlQueryBuilder withCTE(String cteName, SqlQueryBuilder cteQuery) {
    this.cteQueries.put(cteName, cteQuery);
    return this;
  }

  public String build() {
    List<String> queryPieces = this.buildBaseQueryPieces();

    this.buildJoinClauses(queryPieces);
    this.buildWhereClauses(queryPieces);
    this.buildGroupByClauses(queryPieces);
    this.buildHavingClauses(queryPieces);
    this.buildOrderByClauses(queryPieces);
    this.buildLimitClause(queryPieces);
    this.buildOffsetClause(queryPieces);

    return this.buildFinalQuery(queryPieces);
  }

  @SuppressWarnings('PMD.ExcessiveParameterList')
  private SqlQueryBuilder addJoinClause(
    JoinType targetJoinType,
    String tableName,
    String tableAlias,
    String joinOnCondition
  ) {
    // TODO for subqueries this spot doesn't respect delimiter overrides
    // which can make the query hard to read when it's logged
    joinClauses.add(
      targetJoinType.name().replace('_', ' ') +
        ' ' +
        tableName +
        ' ' +
        tableAlias +
        this.indentationDelimiter +
        'ON ' +
        joinOnCondition
    );
    return this;
  }

  private SqlQueryBuilder addOrderByClause(
    String field,
    String orderDirection
  ) {
    this.orderByFields.add(field + ' ' + orderDirection);
    return this;
  }

  private List<String> buildBaseQueryPieces() {
    String combinedQueryFields = this.selectFields.isEmpty()
      ? ' *'
      : this.indentationDelimiter +
        String.join(this.selectFields, ',' + this.indentationDelimiter);
    String combinedTableName =
      this.tableName +
      (String.isBlank(this.tableAlias) ? '' : ' AS ' + this.tableAlias);
    String combinedCTEs = 'WITH ';
    String concatSpacer = ',' + this.newLineCharacter;
    for (String cteName : this.cteQueries.keySet()) {
      combinedCTEs +=
        cteName +
        ' AS (' +
        this.cteQueries.get(cteName).build().trim() +
        ')' +
        concatSpacer;
    }
    List<String> baseQueryPieces = new List<String>();
    if (combinedCTEs != 'WITH ') {
      baseQueryPieces.add(
        combinedCTEs.removeEnd(concatSpacer) + this.newLineCharacter
      );
    }
    baseQueryPieces.add('SELECT' + combinedQueryFields);
    baseQueryPieces.add('FROM ' + combinedTableName);
    return baseQueryPieces;
  }

  private void buildJoinClauses(List<String> queryPieces) {
    if (this.joinClauses.isEmpty() == false) {
      queryPieces.add(String.join(this.joinClauses, this.newLineCharacter));
    }
  }

  private void buildWhereClauses(List<String> queryPieces) {
    if (this.whereClauses.isEmpty() == false) {
      queryPieces.add(
        'WHERE' +
          this.indentationDelimiter +
          String.join(this.whereClauses, this.indentationDelimiter + 'AND ')
      );
    }
  }

  private void buildGroupByClauses(List<String> queryPieces) {
    if (this.groupByFields.isEmpty() == false) {
      queryPieces.add(
        'GROUP BY' +
          this.indentationDelimiter +
          String.join(this.groupByFields, ',' + this.indentationDelimiter)
      );
    }
  }

  private void buildHavingClauses(List<String> queryPieces) {
    if (this.havingClauses.isEmpty() == false) {
      queryPieces.add(
        'HAVING ' +
        String.join(this.havingClauses, this.indentationDelimiter + 'AND ')
      );
    }
  }

  private void buildOrderByClauses(List<String> queryPieces) {
    if (this.orderByFields.isEmpty() == false) {
      queryPieces.add(
        'ORDER BY' +
          this.indentationDelimiter +
          String.join(this.orderByFields, ',' + this.indentationDelimiter)
      );
    }
  }

  private void buildLimitClause(List<String> queryPieces) {
    if (this.limitCount != null) {
      queryPieces.add('LIMIT ' + this.limitCount);
    }
  }

  private void buildOffsetClause(List<String> queryPieces) {
    if (this.offsetCount != null) {
      queryPieces.add('OFFSET ' + this.offsetCount);
    }
  }

  @SuppressWarnings('PMD.AvoidDebugStatements')
  private String buildFinalQuery(List<String> queryPieces) {
    String query = String.join(queryPieces, this.newLineCharacter);

    if (this.unionQueries.isEmpty() == false) {
      query = '(' + query + ')';
      for (SqlQueryBuilder unionQuery : this.unionQueries) {
        query +=
          this.newLineCharacter +
          'UNION' +
          this.newLineCharacter +
          '(' +
          unionQuery.build() +
          ')';
      }
    }
    if (this.indentationDelimiter != DELIMITER_OVERRIDE) {
      Logger.debug(
        'SqlDataBuilder final query:\n' + query
      );
    }
    return query;
  }

  private String convertToSqlValue(Object value) {
    if (value instanceof Datetime) {
      return 'TIMESTAMP \'' +
        ((Datetime) value).formatGmt('yyyy-MM-dd\'T\'HH:mm:ss') +
        '\'';
    } else {
      return String.valueOf(value);
    }
  }

  private static String escapeWithSingleQuotes(Object value) {
    return '\'' + String.escapeSingleQuotes(String.valueOf(value)) + '\'';
  }
}

Expand to view full contents of the test class


@IsTest
private class SqlQueryBuilderTest {
  @IsTest
  static void defaultsToSelectStar() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
      'Some_Data_Cloud_Object__dlm'
    );

    String generatedQuery = queryBuilder.build();

    String expectedQuery = 'SELECT *\nFROM Some_Data_Cloud_Object__dlm';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void supportsTableAliasing() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
      'Some_Data_Cloud_Object__dlm',
      'someTable'
    );

    String generatedQuery = queryBuilder.build();

    String expectedQuery = 'SELECT *\nFROM Some_Data_Cloud_Object__dlm AS someTable';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void addsSelectedFields() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'someTable'
      )
      .selectField('someTable.Id__c')
      .selectField('someTable.Name__c', 'The_Name_Field')
      .selectFields(
        new List<String>{ 'Fake_Field_1_c__c', 'Fake_Field_2_c__c' }
      )
      .selectFields(
        new Map<String, String>{
          'More_Field_1_c__c' => 'more_1',
          'More_Field_2_c__c' => 'more_2'
        }
      );

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT\n' +
      '\tsomeTable.Id__c,\n' +
      '\tsomeTable.Name__c AS The_Name_Field,\n' +
      '\tFake_Field_1_c__c,\n' +
      '\tFake_Field_2_c__c,\n' +
      '\tMore_Field_1_c__c AS more_1,\n' +
      '\tMore_Field_2_c__c AS more_2\n' +
      'FROM Some_Data_Cloud_Object__dlm AS someTable';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void addsDateWhereClause() {
    Date todayish = System.today();
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm'
      )
      .filterWhere('Some_Date_Field__c', '>', todayish);

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM Some_Data_Cloud_Object__dlm\n' +
      'WHERE\n' +
      '\tSome_Date_Field__c > TIMESTAMP \'' +
      ((Datetime) todayish).formatGmt('yyyy-MM-dd\'T\'HH:mm:ss') +
      '\'';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void addsDatetimeWhereClause() {
    Datetime nowish = System.now();
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm'
      )
      .filterWhere('CreatedDate__c', '>', nowish);

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM Some_Data_Cloud_Object__dlm\n' +
      'WHERE\n' +
      '\tCreatedDate__c > TIMESTAMP \'' +
      nowish.formatGmt('yyyy-MM-dd\'T\'HH:mm:ss') +
      '\'';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void addsDecimalWhereClause() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm'
      )
      .filterWhere('Some_Decimal_Field__c', '>', 123.45);

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM Some_Data_Cloud_Object__dlm\n' +
      'WHERE\n' +
      '\tSome_Decimal_Field__c > 123.45';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void addsIntegerWhereClause() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm'
      )
      .filterWhere('Some_Decimal_Field__c', '>', 999999);

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM Some_Data_Cloud_Object__dlm\n' +
      'WHERE\n' +
      '\tSome_Decimal_Field__c > 999999';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void addsStringWhereClause() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm'
      )
      .filterWhere('Name__c', '=', '\'Some escaped string value\'');

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM Some_Data_Cloud_Object__dlm\n' +
      'WHERE\n' +
      '\tName__c = \'Some escaped string value\'';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void addsMultipleWhereClause() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm'
      )
      .filterWhere('Some_Number__c', '=', 123.45)
      .filterWhere(
        'LOWER(Name__c)',
        'LIKE',
        'LOWER(\'%SoMe VaLuE wItH mIxEd CaSe%\')'
      );

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM Some_Data_Cloud_Object__dlm\n' +
      'WHERE\n' +
      '\tSome_Number__c = 123.45\n' +
      '\tAND LOWER(Name__c) LIKE LOWER(\'%SoMe VaLuE wItH mIxEd CaSe%\')';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void addsMultipleWhereClauseWithOr() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder('table')
      .filterWhere(
        new SqlQueryBuilder.OrFilter(
          new List<SqlQueryBuilder.Filter>{
            new SqlQueryBuilder.Filter('fieldA', '=', 1),
            new SqlQueryBuilder.Filter('fieldB', '<', 2)
          }
        )
      );

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM table\n' +
      'WHERE\n' +
      '\t(fieldA = 1 OR fieldB < 2)';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void addsMultipleWhereClauseEvenMoreComplicated() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder('table')
      .filterWhere('fieldAA', '=', true)
      .filterWhere(
        new SqlQueryBuilder.OrFilter(
          new List<SqlQueryBuilder.Filter>{
            new SqlQueryBuilder.Filter('fieldA', '=', 1),
            new SqlQueryBuilder.Filter('fieldB', '<', 2)
          }
        )
      );

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM table\n' +
      'WHERE\n' +
      '\tfieldAA = true\n' +
      '\tAND (fieldA = 1 OR fieldB < 2)';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void innerJoinsTwoTables() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'a'
      )
      .innerJoin(
        'Another_Data_Cloud_Object__dlm',
        'b',
        'a.Some_Lookup_Field_c__c = b.Id__c'
      );

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM Some_Data_Cloud_Object__dlm AS a\n' +
      'INNER JOIN Another_Data_Cloud_Object__dlm b\n' +
      '\tON a.Some_Lookup_Field_c__c = b.Id__c';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void leftJoinsTwoTables() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'a'
      )
      .leftJoin(
        'Another_Data_Cloud_Object__dlm',
        'b',
        'a.Some_Lookup_Field_c__c = b.Id__c'
      );

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM Some_Data_Cloud_Object__dlm AS a\n' +
      'LEFT JOIN Another_Data_Cloud_Object__dlm b\n' +
      '\tON a.Some_Lookup_Field_c__c = b.Id__c';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void rightJoinsTwoTables() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'a'
      )
      .rightJoin(
        'Another_Data_Cloud_Object__dlm',
        'b',
        'a.Some_Lookup_Field_c__c = b.Id__c'
      );

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM Some_Data_Cloud_Object__dlm AS a\n' +
      'RIGHT JOIN Another_Data_Cloud_Object__dlm b\n' +
      '\tON a.Some_Lookup_Field_c__c = b.Id__c';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void fullJoinsTwoTables() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'a'
      )
      .fullJoin(
        'Another_Data_Cloud_Object__dlm',
        'b',
        'a.Some_Lookup_Field_c__c = b.Id__c'
      );

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM Some_Data_Cloud_Object__dlm AS a\n' +
      'FULL JOIN Another_Data_Cloud_Object__dlm b\n' +
      '\tON a.Some_Lookup_Field_c__c = b.Id__c';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void joinsMultipleTablesWithAllSupportedJoinTypes() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'a'
      )
      .innerJoin(
        'Another_Data_Cloud_Object__dlm',
        'b',
        'a.Some_Lookup_Field_c__c = b.Id__c'
      )
      .leftJoin(
        'A_Third_Data_Cloud_Object__dlm',
        'c',
        'a.Another_Lookup_Field_c__c = c.Id__c'
      )
      .rightJoin(
        'A_Fourth_Data_Cloud_Object__dlm',
        'd',
        'c.One_More_Lookup_Field_c__c = d.Id__c'
      )
      .fullJoin(
        'A_Fifth_Data_Cloud_Object__dlm',
        'e',
        'c.Yet_Another_Lookup_Field_c__c = e.Id__c'
      );

    String generatedQuery = queryBuilder.build();

    String expectedQuery = String.join(
      new List<String>{
        'SELECT *\nFROM Some_Data_Cloud_Object__dlm AS a',
        'INNER JOIN Another_Data_Cloud_Object__dlm b\n\tON a.Some_Lookup_Field_c__c = b.Id__c',
        'LEFT JOIN A_Third_Data_Cloud_Object__dlm c\n\tON a.Another_Lookup_Field_c__c = c.Id__c',
        'RIGHT JOIN A_Fourth_Data_Cloud_Object__dlm d\n\tON c.One_More_Lookup_Field_c__c = d.Id__c',
        'FULL JOIN A_Fifth_Data_Cloud_Object__dlm e\n\tON c.Yet_Another_Lookup_Field_c__c = e.Id__c'
      },
      '\n'
    );
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void groupsBySingleField() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'a'
      )
      .selectField('a.Some_Field_c__c')
      .selectField('COUNT(a.Id)', 'RecordCount')
      .groupBy('a.Some_Field_c__c')
      .havingAggregate('COUNT(a.Id)', '>', 10);

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT\n' +
      '\ta.Some_Field_c__c,\n' +
      '\tCOUNT(a.Id) AS RecordCount\n' +
      'FROM Some_Data_Cloud_Object__dlm AS a\n' +
      'GROUP BY\n' +
      '\ta.Some_Field_c__c\n' +
      'HAVING COUNT(a.Id) > 10';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void groupsByMultipleFields() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'a'
      )
      .selectField('a.Some_Field_c__c')
      .selectField('a.Another_Field_c__c')
      .selectField('COUNT(a.Id__c)', 'RecordCount')
      .groupBy('a.Some_Field_c__c')
      .groupBy('a.Another_Field_c__c');

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT\n' +
      '\ta.Some_Field_c__c,\n' +
      '\ta.Another_Field_c__c,\n' +
      '\tCOUNT(a.Id__c) AS RecordCount\n' +
      'FROM Some_Data_Cloud_Object__dlm AS a\n' +
      'GROUP BY\n' +
      '\ta.Some_Field_c__c,\n' +
      '\ta.Another_Field_c__c';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void groupsByHavingSingleAggregate() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'a'
      )
      .selectField('a.Some_Field_c__c')
      .selectField('a.Another_Field_c__c')
      .selectField('COUNT(a.Id)', 'RecordCount')
      .groupBy('a.Some_Field_c__c')
      .groupBy('a.Another_Field_c__c')
      .havingAggregate('COUNT(a.Id)', '>', 10);

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT\n' +
      '\ta.Some_Field_c__c,\n' +
      '\ta.Another_Field_c__c,\n' +
      '\tCOUNT(a.Id) AS RecordCount\n' +
      'FROM Some_Data_Cloud_Object__dlm AS a\n' +
      'GROUP BY\n' +
      '\ta.Some_Field_c__c,\n' +
      '\ta.Another_Field_c__c\n' +
      'HAVING COUNT(a.Id) > 10';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void groupsByHavingMultipleAggregates() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'a'
      )
      .selectField('a.Some_Field_c__c')
      .selectField('a.Another_Field_c__c')
      .selectField('COUNT(a.Id)', 'RecordCount')
      // ANSI SQL + ConnectApi require a group-by statement when using having
      // Currently, SqlQueryBuilder doesn't check or handle this, but the ConnectApi
      // will throw a runtime EXPRESSION_NOT_AGGREGATE error, with the message "Bad SQL Grammar" 😢
      .groupBy('a.Some_Field_c__c')
      .groupBy('a.Another_Field_c__c')
      .havingAggregate('COUNT(a.Id__c)', '>', 10)
      .havingAggregate('COUNT(a.Id__c)', '<', 999);

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT\n' +
      '\ta.Some_Field_c__c,\n' +
      '\ta.Another_Field_c__c,\n' +
      '\tCOUNT(a.Id) AS RecordCount\n' +
      'FROM Some_Data_Cloud_Object__dlm AS a\n' +
      'GROUP BY\n' +
      '\ta.Some_Field_c__c,\n' +
      '\ta.Another_Field_c__c\n' +
      'HAVING COUNT(a.Id__c) > 10\n' +
      '\tAND COUNT(a.Id__c) < 999';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void ordersBySingleField() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'a'
      )
      .selectField('a.Id')
      .orderByAscending('a.Name');

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT\n' +
      '\ta.Id\n' +
      'FROM Some_Data_Cloud_Object__dlm AS a\n' +
      'ORDER BY\n' +
      '\ta.Name ASC';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void ordersByMultipleFields() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm'
      )
      .orderByAscending('a.Name__c')
      .orderByDescending('a.CreatedDate__c');

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM Some_Data_Cloud_Object__dlm\n' +
      'ORDER BY\n' +
      '\ta.Name__c ASC,\n' +
      '\ta.CreatedDate__c DESC';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void offsetsRecordResults() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'a'
      )
      .selectField('a.Id')
      // ANSI SQL + ConnectApi require an order-by statement when using offset
      // Currently, SqlQueryBuilder doesn't check or handle this, but the ConnectApi
      // will throw a runtime ANSI_SQL_API_BAD_REQUEST error, with the message "Offset is not supported without order by."
      .orderByAscending('a.Name')
      .offsetBy(987);

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT\n' +
      '\ta.Id\n' +
      'FROM Some_Data_Cloud_Object__dlm AS a\n' +
      'ORDER BY\n' +
      '\ta.Name ASC\n' +
      'OFFSET 987';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void limitsRecordResults() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm'
      )
      .limitTo(10);

    String generatedQuery = queryBuilder.build();

    String expectedQuery = 'SELECT *\nFROM Some_Data_Cloud_Object__dlm\nLIMIT 10';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void unionsTwoQueries() {
    SqlQueryBuilder query1 = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'a'
      )
      .selectField('a.Id__c')
      .selectField('a.Name__c');
    SqlQueryBuilder query2 = new SqlQueryBuilder(
        'Another_Data_Cloud_Object__dlm',
        'b'
      )
      .selectField('b.Id__c')
      .selectField('b.LastName__c');
    query1.union(query2);

    String generatedQuery = query1.build();

    String expectedQuery =
      '(SELECT\n' +
      '\ta.Id__c,\n' +
      '\ta.Name__c\n' +
      'FROM Some_Data_Cloud_Object__dlm AS a)\n' +
      'UNION\n' +
      '(SELECT\n' +
      '\tb.Id__c,\n' +
      '\tb.LastName__c\n' +
      'FROM Another_Data_Cloud_Object__dlm AS b)';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void vectorSearchOutputsCorrectly() {
    Integer topKResults = 10;
    String topic = 'test';
    Map<String, String> queryFields = new Map<String, String>{
      'v.Id__c' => 'Id',
      'v.Name__c' => 'Name'
    };
    String vectorToChunkJoin = 'v.Id__c = c.Vector_Id__c';

    String vectorSearchQuery = SqlQueryBuilder.vectorSearch(
        'some_vector__dlm',
        'v',
        topic,
        topKResults,
        null
      )
      .selectFields(queryFields)
      .fullJoin('some_vector_chunk__dlm', 'c', vectorToChunkJoin)
      .build();

    Assert.areEqual(
      String.join(
        new List<String>{
          'SELECT\n\t',
          'v.Id__c AS Id,\n\tv.Name__c AS Name\n',
          'FROM vector_search(TABLE(some_vector__dlm),\'test\',\'\',10) AS v\n',
          'FULL JOIN some_vector_chunk__dlm c\n\tON v.Id__c = c.Vector_Id__c'
        },
        ''
      ),
      vectorSearchQuery
    );
  }

  @IsTest
  static void existsFilterOutputsCorrectly() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'a'
      )
      .filterWhere(
        new SqlQueryBuilder.ExistsFilter(
          new SqlQueryBuilder('Another_Data_Cloud_Object__dlm', 'b')
            .selectField('1')
            .filterWhere('a.Some_Lookup_Field_c__c', '=', 'b.Id__c')
        )
      );

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM Some_Data_Cloud_Object__dlm AS a\n' +
      'WHERE\n' +
      '\tEXISTS (SELECT 1 FROM Another_Data_Cloud_Object__dlm AS b WHERE a.Some_Lookup_Field_c__c = b.Id__c)';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void notExistsFilterOutputsCorrectly() {
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm',
        'a'
      )
      .filterWhere(
        new SqlQueryBuilder.NotExistsFilter(
          new SqlQueryBuilder('Another_Data_Cloud_Object__dlm', 'b')
            .selectField('1')
            .filterWhere('a.Some_Lookup_Field_c__c', '=', 'b.Id__c')
        )
      );

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'SELECT *\n' +
      'FROM Some_Data_Cloud_Object__dlm AS a\n' +
      'WHERE\n' +
      '\tNOT EXISTS (SELECT 1 FROM Another_Data_Cloud_Object__dlm AS b WHERE a.Some_Lookup_Field_c__c = b.Id__c)';
    Assert.areEqual(expectedQuery, generatedQuery);
  }

  @IsTest
  static void withCTEOutputsCorrectly() {
    SqlQueryBuilder cteQuery = new SqlQueryBuilder(
        'Some_Data_Cloud_Object__dlm'
      )
      .selectField('Id__c');
    SqlQueryBuilder queryBuilder = new SqlQueryBuilder(
        'Some_Other_Cloud_Object__dlm',
        'b'
      )
      .withCTE('cteAlias', cteQuery)
      .withCTE('cteAlias2', cteQuery)
      .innerJoin('cteAlias', 'a', 'b.Id__c = a.Id__c');

    String generatedQuery = queryBuilder.build();

    String expectedQuery =
      'WITH cteAlias AS (SELECT  Id__c FROM Some_Data_Cloud_Object__dlm), ' +
      'cteAlias2 AS (SELECT  Id__c FROM Some_Data_Cloud_Object__dlm)  ' +
      'SELECT * FROM Some_Other_Cloud_Object__dlm AS b ' +
      'INNER JOIN cteAlias a  ON b.Id__c = a.Id__c';
    Assert.areEqual(expectedQuery, generatedQuery.replaceAll('(\n|\t)', ' '));
  }
}

If you’ve read You Need A Strongly Typed Query Builder, none of this should come as a surprise; using a fluent, builder-based API to abstract away the complexities of query creation is something I can’t recommend enough. I wanted to show off what a vector search method would look like, in particular, as the formatting syntax for that particular kind of query is otherwise quite painful to get right (look at the test method for vector_search to see what the expected output is, as an example of what I’m talking about).

Is this the end-all, be-all syntax to aspire to? Absolutely not! But it might help you create queries that can be both run in the Query Builder (hence the double quotes surrounding most things) and work in Apex. A pattern like this allows you to centralize logging, create easy-to-use, type-powered queries that are composable and reusable. By all means — improve upon it, iterate over it, and keep making it better in your own orgs.

Aggregating Complex Data

Consider the following Enrollments table:

Event Name	Status	Event Id	User Id
Event 1	Attended	foreign_key_1	1
Event 2	Did Not Attend	foreign_key_2	1
Event 3	Attended	foreign_key_3	3

If somebody asks the following question: “how many people have attended a given event? Also how many people haven’t signed up at all?”, that’s a complicated question! Or it would be, if we didn’t know SQL. Using the power of Common Table Expressions, we can trivially issue a query to the CDP Connect API (or the Query API, but be forewarned that this requires a Data Cloud access token, which can mean additional hoop jumping):

WITH RelevantEvents AS (
  SELECT "Name__c", "Id__c"
  FROM "Event__dlm"
  WHERE "Id__c" = 'injectedEventId'
),
Enrollments AS (
  SELECT DISTINCT(enrollment."User_Id_c__c")
  FROM "Enrollment__dlm" enrollment
  JOIN "RelevantEvents" ev on enrollment."Event_Id_c__c" = ev."Id__c"
  WHERE enrollment."Status_c__c" = 'Attended'
)

SELECT
  -- the inverted case
  COUNT(
    CASE
      WHEN enrollments."User_Id_c__c" IS NOT NULL THEN NULL
      ELSE 1
    END
  ) AS "Not Yet Enrolled",
  -- the regular case
  COUNT(enrollments."User_Id_c__c") AS "Attended",
  COUNT(u."Id__c") AS "Total"
FROM "User__dlm" u
LEFT JOIN Enrollments enrollments ON enrollments."User_Id_c__c" = u."Id__c"
WHERE "IsActive__c" = TRUE

This query outputs something like:

Not Yet Enrolled	Completed Training	Total
2131	13121	15252

Because we’re starting the query from Apex, things like the injectedEventId variable can be parameterized (which also holds true for the other variables, like the status). Here’s the same query produced by an instance of the aforementioned SqlQueryBuilder:

SqlQueryBuilder relevantEventsCTE = new SqlQueryBuilder('Event__dlm')
  .selectFields(new List<String>{ 'Name__c', '"Id__c"' })
  .filterWhere('"Id__c"', '=', injectedEventId);

SqlQueryBuilder enrollmentsCTE = new SqlQueryBuilder('Enrollment__dlm', 'enrollment')
  .selectField('DISTINCT(enrollment."User_Id_c__c")')
  .innerJoin('RelevantEvents', 'ev', 'enrollment."Event_Id_c__c" = ev."Id__c"')
  .filterWhere('enrollment."Status_c__c"', '=', 'Attended');

new SqlQueryBuilder('User__dlm', 'u')
  .selectFields(
    new List<String>{
      'COUNT(enrollments."User_Id_c__c" IS NOT NULL THEN NULL ELSE 1 END) AS "Not Yet Enrolled"',
      'COUNT(enrollments."User_Id_c__c") AS "Attended"',
      'COUNT(u."Id__c") AS "Total"'
    }
  )
  .withCTE(relevantEventsCTE, 'ev')
  .withCTE(enrollmentsCTE, 'enrollments')
  .leftJoin('Enrollments', 'enrollments', 'enrollments."User_Id_c__c" = u."Id__c"')
  .filterWhere('u."IsActive__c"', '=', true)
  .build();

De-Duplicating Data

It can be onerous and complicated to unify external data models against your own. While the possibility for de-duplication exists during the extract or transformation part of any ETL pipeline, having access to the raw data means you don’t have to make the choice between boxing yourself (and other teams) out of future data needs.

SQL is really, really good at de-duplication. There are three important window functions you should be familiar with when creating de-duplication rules:

RANK()
DENSE_RANK()
ROW_NUMBER()

Here’s a good StackOverflow article that walks through the various nuances associated with when to use each function in case the link to our docs above isn’t sufficient.

Here’s an example of using CTEs and functions to de-duplicate users when compared to an external data set:

-- Duplicates will have Rank columnar values of 2, 3, 4, etc...
WITH ExternalUsers AS (
SELECT
    ext.*,
    CASE
      WHEN u."User_Potentially_Null_Date_c__c" > ext."External_User_Potentially_Null_Date_c__c" THEN u."User_Potentially_Null_Date_c__c"
      WHEN ext."External_User_Potentially_Null_Date_c__c" > u."User_Potentially_Null_Date_c__c" THEN ext."External_User_Potentially_Null_Date_c__c"
      -- coalesce takes an arbitrary number of arguments and returns the first non-null value
      ELSE COALESCE(u."User_Potentially_Null_Date_c__c", ext."External_User_Potentially_Null_Date_c__c")
    END AS "Some Date We Care About",
   -- you can also use CASE statements _in_ an ORDER BY clause here to dynamically create rank values
   ROW_NUMBER() OVER(PARTITION BY "User_Id_c__c" ORDER BY "Id__c" DESC) AS "Rank"
  FROM "Ext_User__dlm" ext
  JOIN "User__dlm" u ON u."Id__c" = ext."User_Id_c__c"
    AND u."IsActive__c" = TRUE
)

-- most typically, you'd then be joining
-- this CTE to ANOTHER data set using
-- Rank = 1 as part of the join clause
SELECT *
FROM ExternalUsers
WHERE Rank = 1

Consumers of ExternalUsers can either join or otherwise select where Rank = 1. This example also show-cases another helpful SQL technique – that of blending data when there isn’t a single source of truth. By using a case statement in conjunction with COALESCE, we get the best possible result from what would otherwise be an inconsistent field within each dataset.

Dynamically Acting on Data Using Exists/Not Exists

EXISTS and NOT EXISTS historically had the benefit of query planner optimizations that made them slightly more efficient to execute than IN/NOT IN statements. While that performance improvement may not be as noticeable with more modern databases, I think knowing how to use EXISTS is still crucial when it comes to deciding which rows in a table to display.

In keeping with our event-based queries, here’s an example of using NOT EXISTS to only show events where a user hasn’t already attended:

SELECT *
FROM Event__dlm AS ev
WHERE 1 = 1
AND NOT EXISTS (
  SELECT 1
  FROM Enrollment__dlm AS e
  JOIN User__dlm u
		ON u.Id__c = e.User_Id_c__c
    AND u.IsActive__c = TRUE
  WHERE u.Email__c = '{user in question email address}'
  AND e.Status_c__c = 'Attended'
  AND e.Event_Id_c__c = ev.Id__c
)

Note that EXISTS and NOT EXISTS tend to use a subtle paradigm that you may be unfamiliar with — you can refer to an “outer” table within the parentheses, and you should do that, typically to provide an anchor to the data that’s part of the “outer” query. That’s not universally true — especially when querying relative date ranges — but it’s true often enough to warrant particular attention.

Introduce Constants With Cross Join

I was recently reviewing a colleague’s code and I saw something interesting — they were referring to a single, fixed date in six different places within their query. Using CTEs and the power of CROSS JOIN, you can replace repetition with a single source of truth like such:

With Constants AS (
  SELECT to_date('2015-01-01', 'YYYY-mm-dd') AS BoundaryDate
)
-- ...
SELECT a.*
FROM aTable__dlm a
CROSS JOIN Constants c
WHERE a.Some_Date_c__c >= c.BoundaryDate

Cross joins can also be useful when you need to provide things like consistent timestamps for every row, like such:

WITH Constants AS (
  SELECT now() AS Runtime
)

-- notice how in this one
-- we don't only select all rows and columns from "a"
-- but also all columns from c, which means each row
-- returned gets a "Runtime" with the same value
SELECT *
FROM aTable__dlm a
CROSS JOIN Constants c

Addressing Product Gaps

It’s unfortunate that the SOQL-to-SQL converter doesn’t support most of Data Cloud SQL’s functions, because it comes with great out of the box testability (via the Test.createSoqlStub methods). Another thing to be aware of is the subtle differences between SOQL and SQL that can make translated SQL clauses built using SOQL react in unexpected ways.

My team encountered this early on while using Data Cloud queries — we wanted to do a case-insensitive search for data. In SOQL, that’s trivial: all SOQL text searches are case insensitive. In SQL, however, that’s not the expected behavior. As a result, if you’re only familiar with SOQL as a query language, it won’t be obvious why something like this won’t work in Apex:

public static List<User__dlm> findMatchingUsers(String nameParticle) {
  return [
    SELECT Id
    FROM User__dlm
    WHERE Name__c LIKE  :'%' + passedInName + '%'
  ];
}

Or rather — it might work, but “work” will be relative to whether or not the casing of the nameParticle parameter exactly matches what’s in the database, which may be counter-intuitive when it comes to performing wildcard-based search.

The SqlQueryBuilder excerpt shown before makes something like this much easier:

public static List<User__dlm> findMatchingUsers(String nameParticle) {
  String dataCloudQuery = new SqlQueryBuilder('User__dlm', 'u')
    .selectField('Id__c')
    .filterWhere(
      'LOWER(u.Name__c)',
      'LIKE',
      'LOWER(\'%' + nameParticle.trim() + '%\')'
    )
  .build();
  // and then either call the query API or the connect API...
}

Be Careful When Comparing NULL In Case Statements

CASE statements have two forms:

the first is very similar to a switch statement
the second is a boolean comparison

If you have a column with possible row values of 1, 2, or 3, you can easily use the first form:

SELECT
  CASE YourField
    WHEN 1 THEN 'something'
    WHEN 2 THEN 'something else'
    ELSE 'Probably 3!'
  END AS "SomeCaseStatement"

But if you have a column with possible row values of ‘Hi’, ‘Hello’ or null, you’ll be in for a shock if you try to “switch” on that value:

SELECT
  CASE YourField
    WHEN NULL THEN 0
    -- I don't know why you need to cast here, but you do
    -- you can also cast with "YourField"::int
    -- but that's way uglier, in my opinion
    ELSE CAST("YourField" AS int)
  END AS "SomeCaseStatement"

That would produce the following columnar values, likely surprising you with the last null value:

SomeCaseStatement
1
2

Instead, use the second form:

SELECT
 CASE
   WHEN "YourField" IS NULL THEN 'Oh no!'
	 ELSE CAST("YourField" as varchar)
  END AS "SomeCaseStatement"

As that will produce something more along the lines of what you would expect:

SomeCaseStatement
Hi
Hello
Oh no!

TL;DR make sure you understand how NULL behaves in SQL, because particularly with comparisons it’s a bit different than what you might expect from a programming language.

Know The Difference Between Different Unions

If you use Tableau, chances are you’re already familiar with using UNION clauses to pad out data because that’s one of the common workarounds that prevents X axis values from appearing/disappearing as filters are applied. Otherwise…

UNION is an incredibly powerful operation in SQL. Let’s go back to our enrollments table:

Event Name	Status	Event Id	User Id
Event 1	Attended	foreign_key_1	1
Event 2	Did Not Attend	foreign_key_2	1
Event 3	Attended	foreign_key_2	3

Let’s now imagine another table where some other event data is stored for online events, not all of which have in-person attendance:

Name	Status	Event Id	User Id
Digital Event 4	Attended	foreign_key_4	1
Digital Event 2	Did Not Attend	foreign_key_2	1
Digital Event 5	Attended	foreign_key_3	null

In Apex, you’d have to perform post-processing on the result of two queries to massage this data into the correct format. In SQL? You can use UNION to create the combined data set. In this case, you also should use UNION ALL, as UNION performs a de-duplication step. Sometimes that behavior is exactly what you’re looking for, but if you know upfront that the data sets you’re combining are mutually exclusive, using UNION ALL can shave crucial milliseconds off of your query:

SELECT *
FROM Enrollment__dlm

UNION ALL

SELECT
  Name__c,
  Status_c__c,
  Event_Id_c__c,
-- you don't need to alias this last field
-- since UNION takes column names from the top SELECT
  CASE User_Id_c__c
    WHEN NULL THEN 'Anonymous'
    ELSE User_Id_c__c
  END
FROM DigitalEnrollment__dlm

UNIONed queries are quite powerful, though invariably I find that they end up being bittersweet to use. You inevitably end up needing to spell out every field you’re selecting, and in particular you have to be quite careful that the order you’re selecting fields exactly matches the order of the top-most SELECT statement in any particular UNION. You’re also frequently only doing that to include a CASE statement or two, but (!) those CASE statements end up enabling the data to be blended together to begin with. So there is that.

Finally, as I alluded to in the intro: UNION clauses give you the ability to perform complex sorting and post-processing logic on vector and hybrid-type searches in Data Cloud, giving you the ability to do things that the out-of-the-box retrievers can’t: like CASE-based sorts, keyword in chunk detection, and much more.

In Conclusion

SQL is a wonderfully expressive language, and it’s an extremely powerful tool at your disposal when working with Data Cloud. Because it’s so expressive, there are frequently three or more ways to achieve the same effect in SQL, which can lead to tribal-esque SQL idioms within organizations (“we prefer CTEs”, “we prefer sub selects”, “we prefer temp tables”, etc…).

That expressiveness can be overwhelming as a newcomer to the language; speaking from experience, it can also be punishing if you’re not in the habit of benchmarking because query performance without the ability to inspect the plan for the queries you’re producing can vary drastically depending on the approach. But I mention that not to dissuade you — if anything, it’s something that I view as an exciting part of writing SQL. I’ve been writing SQL for 15 years now, and I still routinely learn new ways to achieve the same result. For example, I was really pleased to discover the unnest function:

SELECT * FROM unnest(
  ARRAY[1, 2, 3, 4, 5, 6, 7],
  ARRAY['Mon','Tue','Wed','Thu','Fri','Sat','Sun']
) AS days("Day Number", "Day Name")

That query produces a table like this:

Day Number	Day Name
1	Mon
2	Tue
3	Wed
4	Thu
5	Fri
6	Sat
7	Sun

Having tools like this at your disposal can be really helpful when building up “constants”-type tables in the memory of the query you’re running. The unnest docs page has some additional helpful nuance on how arrays of unequal size end up being used.

If you’re working with Tableau a lot, you may also want to get familiar with the Tableau Hyper API, as it comes with even more options for working with SQL. I’m still hopeful that support for scalar functions will be introduced someday, because complex pieces of logic in SQL (particularly the sort of thing that involves long CASE expressions) sometimes have to be duplicated. This is a frequent occurrence with UNION clauses where the same logic needs to be applied to blended data regardless of its source.

I also know that the “hub and spoke” org model for Data Cloud means that OAuth flows end up with an additional complication when it comes to getting access tokens to query Data Cloud via REST from a different org. Depending on your use-case, Data Cloud One may be more viable; I’ve also heard from the External Auth Identity Provider team (you may remember their work from Implementing OAuth Browser Flows Properly) that they are working on improvements to the current process which will allow you to declaratively wire up token exchanges like this one — safe harbor!.

As always, thanks to Arc and Henry for their support on Patreon. I haven’t had many chances to sit down and write, recently — this very article has been brewing for many moons now — but I love that regardless of what else is happening in life and work, I always get the chance to sit down and do deep dives like this. Hopefully it helps, and thanks for reading!

← Reduce, Reuse, Refactor: Interfaces & Object Inheritance

In the past 6 years, hundreds of thousands of you have come to read & enjoy the Joys Of Apex. Over that time period, I've remained staunchly opposed to advertising on the site, but I've made a Patreon account in the event that you'd like to show your support there. Know that the content here will always remain free. Thanks again for reading — see you next time!