Regular Expression to Escape SOQL/SOSL Inputs


Hello, and welcome to Another Salesforce Blog!  Here I will be posting solutions to problems that I couldn’t find an answer to in hopes of helping those who find themselves stuck when using the Salesforce platform.

User Story

We want to create a table with search, sort, and pagination functionality. Over the course of the next several posts, we are going to look at two possible ways to achieve this, with either server-side or client-side functionality. We will start with the basics of searching: writing our queries.

If you would like to skip ahead to the regular expression bit, scroll down to Step Two.

Background

Salesforce Object Search Language (SOSL) and Salesforce Object Query Language (SOQL) are two powerful query languages that allow us to search for records within Salesforce. We will be using SOQL for our project, but going over how to utilize both.

How do you know which to use? The documentation linked above lays it out:

When to Use SOSL

Use SOSL when you don’t know which object or field the data resides in, and you want to:

  • Retrieve data for a specific term that you know exists within a field. Because SOSL can tokenize multiple terms within a field and build a search index from this, SOSL searches are faster and can return more relevant results.
  • Retrieve multiple objects and fields efficiently where the objects might or might not be related to one another.
  • Retrieve data for a particular division in an organization using the divisions feature.
  • Retrieve data that’s in Chinese, Japanese, Korean, or Thai. Morphological tokenization for CJKT terms helps ensure accurate results.

When to Use SOQL

Use SOQL when you know which objects the data resides in, and you want to:

  • Retrieve data from a single object or from multiple objects that are related to one another.
  • Count the number of records that meet specified criteria.
  • Sort results as part of the query.
  • Retrieve data from number, date, or checkbox fields.

Clear as mud, right?

Each language has its own quirks. For example, the minimum search length for SOSL is two non-wildcard characters, SOSL will only return 2000 records, SOQL is what you use if you need to sort, and they both have different sets of reserved characters that need to be escaped.

Say what?

Yep! In simple terms, when writing code, there are certain characters that are treated as part of the code. In order for them to be read as plain characters, they need to be escaped.

In SOSL, these characters are: &|!(){}[]^"~*?:\'+-

In SOQL, the only characters needing to be escaped are: "'\_%

Why does this matter?

If a client wants to find the contact Evelyn "Another Salesforce Blog" Grizzle or an account named Fifty%Fifty, they might want to search by the special characters contained within the name to narrow down the search results. Or, if, for example, a naming convention contains underscores, such as 06_30_2021_Account, we will want to be able to include the underscores in our search. And, let’s be very serious, we’ve all known that person who names an account ***Very Important Account***, and we should factor them in, too. In order to do this, we must escape the requisite characters in order to be able to search by them.

This tutorial requires knowledge of Lightning Web Components, Apex, SOQL/SOSL, regular expressions, and requires Visual Studio Code.

Solution

Step One – Build the Query in Apex.

We’ll keep it simple and query for an account and its related contacts. Ultimately, we will be using a static SOQL query for this project, but let’s see what this looks like with a dynamic query.

Dynamic Query – SOQL
public static List<Account> getAccounts_dynamicSOQL(String searchString) {
    /** STRING SANITIZATION TO HAPPEN HERE! */
    String queryString = 'SELECT Id, Name, (SELECT Id, Name FROM Contacts) FROM Account WHERE Name LIKE :sanitizedSearchString';

    List<Account> listToReturn = database.query(queryString);
    return listToReturn;
}
Static Query – SOQL
public static List<Account> getAccounts_staticSOQL(String searchString) {
    /** STRING SANITIZATION TO HAPPEN HERE! */
    List<Account> listToReturn = [SELECT Id, Name, (SELECT Id, Name FROM Contacts) FROM Account WHERE Name LIKE :sanitizedSearchString];
    return listToReturn;
}

If, say, we wanted to search for any Account or Contact with a specific name, we could use SOSL. This is possible in SOQL with multiple queries, but we’re keeping it simple for this exercise, so we’ll just use SOSL.

Dynamic Query – SOSL
public static List<List<SObject>> getAccountsAndContacts_dynamicSOSL(String searchString) {
    /** STRING SANITIZATION TO HAPPEN HERE! */
    String queryString = 'FIND :sanitizedSearchString IN Name Fields RETURNING Account, Contact';
    
    List<List<SObject>> listToReturn = search.query(queryString);
    return listToReturn;
}
Static Query – SOSL
public static List<List<List<SObject>> getAccountsAndContacts_staticSOSL(String searchString) {
    /** STRING SANITIZATION TO HAPPEN HERE! */
    List<List<SObject>> listToReturn = [FIND :sanitizedSearchString IN Name Fields RETURNING Account, Contact];
    return listToReturn;
}

These queries look fine, but they’re susceptible to injection as written because we aren’t escaping single quotes. We can easily do this with the method String.escapeSingleQuotes(stringToEscape), but we also want to escape the reserved special characters, so we will handle this with a regular expression.

Step Two – Build the Regular Expression.

Remember what I said about SOSL and SOQL having different reserved characters? This is where that is going to make a difference.

Foolishly, I thought that they had the same reserved characters, and struggled to come up with the regex as a result. My wasted weekend is your gain, as I will be sharing and explaining the search patterns below.

We will be creating a reusable sanitization method, sanitizeString, that we will run our searchText input through for each of our queries. We will do this utilizing the Java Patterns Library and the Salesforce documentation.

SOSL REGEX

Again, the reserved characters that we must escape in SOSL are: In SOSL, these characters are: &|!(){}[]^"~*?:\'+-

This matches the \\p{Punct} class in Java fairly well, as \\p{Punct} represents all punctuation characters, .!#$%&'()*+,-./:;<=>?@[\]^_`{|}~. There are a handful of characters that we have to remove from the \\p{Punct} list, but it’s a good starting off point. We can either build a regex from the ground up, or use character subtraction from \\p{Punct} to get our values. Let’s use character subtraction.

public static String sanitizeStringSOSL(String searchText) {
    // \\p{Punct} in the pattern class matches all punctuation, including single quotes
    // https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
    String REGEXP = '[\\p{Punct}&&[^.#$%,/;<=>@_`]]';
    // \\$0 is the entire matched string, and \\ escapes it
    String REPLACE = '\\\\$0';

    return '%' + searchText.replaceAll(REGEXP, REPLACE) + '%';
}
SOQL REGEX

First, we can use a regex generator, such as Regex101 to generate a basic regex string. It is important to note that our characters will need to be escaped for Salesforce to interpret them properly, so our string will look a little bit different, but this is a great way to get an idea of what we’re looking for.

Testing the unescaped regex with https://regex101.com/

We escape each of the characters once to get the literal value in the regex generator, and then we escape each escape character so that Salesforce can read it. In order from right to left, \%_"':

  • \: the backslash character needs to be escaped three times in order for it to appear in Java. This is because it is the escape character, so we need to escape the escape that escapes it.
  • %: \% is not a valid string literal in Salesforce, so we escape it.
  • _: \_ is not a valid string literal in Salesforce, so we escape that too.
  • ": again, we need escape a backslash three times for it to appear once, and that one backslash escapes our double quote.
  • ': we escape the single quote once so that it appears without breaking our string literal, and then escape that backslash with another backslash, which needs to be escaped three times.

Easy as pie, right?

public static String sanitizeStringSOQL(String searchText) {
    // regex to escape SOQL characters per documentation
    // https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql_select_quotedstringescapes.htm
    String REGEXP = '\\\\\'|\\\\"|\\_|\\%|\\\\';
    // \\$0 is the entire matched string, and \\ escapes it
    String REPLACE = '\\\\$0';

    return '%' + searchText.replaceAll(REGEXP, REPLACE) '%';
}

Now that we have our regular expressions, we’re able to start with a server side search! Stay tuned.

Thanks for reading, let me know if you have any comments or questions!

Evelyn Grizzle

Another Salesforce Blog

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$50.00
$5.00
$15.00
$50.00
$5.00
$15.00
$50.00

Or enter a custom amount

$

Help keep Another Salesforce Blog on the internet by donating today!

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly

2 responses to “Regular Expression to Escape SOQL/SOSL Inputs”

  1. Hi,
    nice article, but there should not be a + at the end of the sanitizeStringSOSL regex because if there is a special character following another, only the first will be escaped if the plus sign is present :

    String REGEXP = ‘[\\p{Punct}&&[^.#$%,/;@_`]]+’;
    should rather be
    String REGEXP = ‘[\\p{Punct}&&[^.#$%,/;@_`]]’;

    Regards

Leave a Reply

%d bloggers like this: