2 ways to Split String with Dot (.) in Java using Regular Expression

You can use the split() method of java.lang.String class to split a String based on the dot. Unlike comma, colon, or whitespace, a dot is not a common delimiter to join String, and that's why beginner often struggles to split a String by dot. One more reason for this struggle is the dot being a special character in the regular expression. If you want to split String on the dot you need to escape dot as \\. instead of just passing "." to  the split() method. Alternatively, you can also use the regular expression [.] to split the String by a dot in Java. The dot is mostly used to get the file extension as shown in our example. The logic and method are exactly same as earlier examples of split string on space and split the string on a comma, the only difference is the regular expression. Once you know how to write the regular expression, all these examples will be same for you. Java's regular expression is inspired by Perl.


A regular expression is often considered as an advanced concept by Java developer and that's the main reason of Java programmers not being comfortable with utilizing the power of regular expression for text processing. Also, some of the most popular Java books like Head First Java or Core Java Volume 1 doesn't cover regular expression in good detail; but, as a Java developer, you cannot ignore regular expression.


A regular expression is one of the best and most powerful tools for an experienced developer, whether you are working on Java projects or searching log files for patterns in UNIX box. Since Java regular expression is Perl like learning Java regex also helps to effectively use the find and grep commands. If you are an experienced Java developer then you should read Java Regular Expressions: Taming the java.util.regex Engine to master the regex in Java. It will not only teach you basics of regex but also empower with how to use them effectively.

How to split String by dot in Java using regular expression



Splitting String by Dot in Java using Regular Expression

First try
Most of the Java programmer first try the following approach when they need to split the String on dot character:

String textfile = "ReadMe.txt";
String filename = textfile.split(".")[0];
String extension = textfile.split(".")[1];

This will not work because dot (.) is a special character in Java regular expression to match any single character. Above code will throw java.lang.ArrayIndexOutOfBoundsException: 0 because split() will return an empty array, as shown below:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at StringSplitWithRegEx.main(StringSplitWithRegEx.java:9)

The problem with this code is that "." is a metacharacter if you want to use it literally you need to escape it by using backslash e.g. \\. , though you should remember that to escape dot you just need one backslash i.e \., but in Java since \ backslash also need escaping you need two backslashes or \\, as shown below:

String textfile = "ReadMe.txt";
String filename = textfile.split("\\.")[0];
String extension = textfile.split("\\.")[1];

Alternatively, you can also use the [.] regular expression to split the String by dots in Java, as shown below:

String extension = "minecraft.exe".split("[.]")[1];

The reason [.] work because the dot is inside character class i.e. double brackets []. Only characters ]^-\ have special meaning inside character classes in Java and dot is not one of them, which means you can use it literally inside character class or [ ].

Though it's good to remember that which characters has special meaning in Java regular expression inside and outside of character class

1) The characters .^$|*+?()[{\ have special meaning outside of character classes.
2) The characters ]^-\ have special meaning inside of character classes.


Java Example to Split String by Dot

Here is our sample program to show you how to split a String by dot (.) in Java. In this example, you will find what works, what doesn't work and why. The examples are pretty much similar to splitting String by any delimiter, with only focus on using the correct regular expression because the dot is a special character in Java's regular expression API.

import java.util.Arrays;


public class StringSplitWithRegEx{

public static void main(String args[]) {

// 1st example - splitting string by dot in Java 
// this will not work because dot is a special character in 
// regular expression which will match with any single character

String file = "abc.txt";
String[] array = file.split("."); 

System.out.println("input string: " + file);
System.out.println("output array after splitting with . : " + Arrays.toString(array));

// solution is to escape dot in Java as shown below
array = file.split("\\.");
System.out.println("input string: " + file);
System.out.println("output array after splitting with regex'\\.' : " + Arrays.toString(array));

// or you can also use following regular expression to split string on dot (.)
array = file.split("[.]");
System.out.println("input string: " + file);
System.out.println("output array after splitting with regex '[.]' : " + Arrays.toString(array));

// once you have got the individual words, you can get the file name and extension as follow
String filename = array[0];
String extension = array[1];

System.out.println("file: " + file);
System.out.println("name: " + filename);
System.out.println("extension: " + extension);
}

}

Output
input string: abc.txt
output array after splitting with . : []
input string: abc.txt
output array after splitting with regex'\.' : [abc, txt]
input string: abc.txt
output array after splitting with regex '[.]' : [abc, txt]
file: abc.txt
name: abc
extension: txt


That's all about how to split a String by dot in Java. Remember, even though you can use the split() method in the same way you used earlier to split by comma and whitespace, the tricky part here is dot being a special character in the regular expression.

In order to use dot literally, you need to escape it e.g. \. but again a backslash also require escaping in Java, you should give \\. i.e. double backslash. Alternatively, you can also give [.] because only ]^-\ characters have special meaning inside of character classes ([...]) and dot is not one of them, which means it will be treated literally.



No comments :

Post a Comment