Wednesday, September 15, 2021

StringTokenizer Example in Java with Multiple Delimiters - Example Tutorial

StringTokenizer is a legacy class for splitting strings into tokens. In order to break String into tokens, you need to create a StringTokenizer object and provide a delimiter for splitting strings into tokens. You can pass multiple delimiters e.g. you can break String into tokens by, and: at the same time. If you don't provide any delimiter then by default it will use white-space. It's inferior to split() as it doesn't support regular expression, also it is not very efficient. Since it’s an obsolete class, don't expect any performance improvement either. On the hand split() has gone some major performance boost on Java 7, see here to learn more about splitting String with regular expression.

StringTokenizer looks easier to use but you should avoid it, except for trivial task. Always  Prefer String's split() method for splitting String and for repeated split use Pattern.split() method.

Coming back to StringTokenizer, we will see three examples of StringTokenizer in this article. The first example is to break String based on white-space, the second example will show how to use multiple delimiters, and the third example will show you how to count the number of tokens.

In order to get tokens, you basically follow the Enumeration style model, i.e. checking for more tokens using hasMoreTokens() and then getting tokens using nextToken().

And, If you are new to the Java world then I also recommend you go through The Complete Java MasterClason Udemy to learn Java in a better and more structured way. This is one of the best and up-to-date courses to learn Java online.





Java StringTokenizer Example

StringTokenizer Example in Java with Multiple DelimiterHere is the full code of our Java StringTokenizer Example. You can copy-paste this code into your favorite IDE and run it straight away. It doesn't require any third-party library like Apache commons or Google Guava. All you need to do is create a Java source file with the same name as the public class of this example, the IDE will take care of compiling and running this example.


Alternatively, you can also compile and execute this example from the command prompt as well. If you look at the first example, we have a String where words are separated by a white-space, and to get each word from that String, we have created a StringTokenizer object by passing that String itself, notice we have not provided any delimiter, because by default StringTokenizer uses white-space as a token separator.


In order to get each token, in our case word, you just need to loop, until hasMoreTokens() returns false. Now to get the word itself, just call nextToken() method of StringTokenizer. This is similar to Iterating over Java Collection using the Iterator, where we use the hasNext() method as while loop condition and next() method to get the next element from Collection.


The second example is more interesting because here our text is a web address, which has protocol and IP address. Here we are passing multiple delimiters to split http string e.g. //(double slash), :(colon) and(dot), Now StringTokenizer will create a token if any of this is found in target String.  


The third example shows you how to get a total number of tokens from StringTokenizer, quite useful if you want to copy tokens into an array or collection, as you can use this number to decide the length of array or size of the respective collection. 

import java.util.StringTokenizer;

/**
 * Java program to show how to use StringTokenizer for breaking a delimited
 * String into tokens. StringTokenizer allows you to use multiple delimiters as
 * well. which means you can split String containing comma and colon in one call.
 *
 * @author Javin Paul
 */
public class StringTokenizerDemo{
   
    public static void main(String args[]) {

        // Example 1 - By default StringTokenizer breaks String on space
        System.out.println("StringTokenizer Example in Java, split String on whitespace");

        String word = "Which one is better, StringTokenizer vs Split?";
        StringTokenizer tokenizer = new StringTokenizer(word);
        while (tokenizer.hasMoreTokens()) {
            System.out.println(tokenizer.nextToken());
        }


        // Example 2 - StringTokenizer with multiple delimiter
        System.out.println("StringTokenizer multiple delimiter Example in Java");

        String msg = "http://192.173.15.36:8084/";
        StringTokenizer st = new StringTokenizer(msg, "://.");
        while (st.hasMoreTokens()) {
            System.out.println(st.nextToken());
        }
       
       
        // Example 3 - Counting number of String tokens
        System.out.println("StringTokenizer count Token Example");

        String records = "one,two,three,four,five,six,seven";
        StringTokenizer breaker = new StringTokenizer(records, ",");
        System.out.println("Total number of tokens : " + breaker.countTokens());
    }
}
Output:
StringTokenizer Example in Java, split String on whitespace
Which
one
is
better,
StringTokenizer
vs
Split?

StringTokenizer multiple delimiter Example in Java
http
192
173
15
36
8084

StringTokenizer count Token Example
Total number of tokens : 7

As I said, all this functionality is also available to the String class' split method, and you should use that as your default tool for creating tokens from String or breaking them based upon any limiter. To learn more about the pros and cons of using the StringTokenizer and Split method,  you can see my post difference between Split vs StringTokenizer in Java.


That's all on how to use StringTokenizer in Java with multiple delimiters. Yeah, it's convenient, especially if you are not very comfortable with regular expression. By the way,  if that's the case then you better spend some time learning regular expression, not just to split String into tokens but to use regex as a skill.

You will be surprised to see the power of regular expression, while searching, replacing and doing other text stuff. StringTokenizer is also a legacy class, which is only retained for compatibility reasons and you should not use it in new code.

It is recommended to use the split method of String for splitting strings into tokens or Patterns.split() method from java.util.regex package instead. In terms of performance also, split() has got a major boost in Java 7 from Java 6, and it's reasonable to expect performance improvement only on the split() method, because no work will be done on StringTokenizer.


5 comments:

  1. Nice writeup Paul. I think StringTokenizer is one of those classes which' existence is often forgotten. I also 100% agree on the importance on regular expressions. Might look scary at first, but they are simply unavoidable sooner or later.

    Just a sidehint: For parsing more complex strings, one can use parsii which provides a quite flexible tokenizer capable of frequently used token types and unlimited lookahead: https://github.com/scireum/parsii (open source + MIT license of course)

    cheers Andy

    ReplyDelete
  2. Hi,
    How can i tokenize string with delimeter ,, or .. (one delim followedby other)?

    ReplyDelete
  3. Hello Anonymous, not sure if you can use multiple delimiter in one go. Your solution may be to first break string on one token and then break those tokens again on next delimiter. That's ideal but I think it will work.

    ReplyDelete
  4. Hey how do I put a space in the delimiter list if I have multiple delimiters. Example list of delimiters -,./?! and with all these space is also included as a delimiter.

    ReplyDelete
  5. hello great writeup, i have a large data in a text file which i have to insert into database example: 1<<__@__>> John:3,1<<__@__>>Paul:3,2<<__@__>> Google:2, and so on here 1 is the document id john is a person name and 3 is it name id . is there any way i can get these data in a proper format so it gets directly inserted into the database rather than printing all in new line thank you please guide

    ReplyDelete