This week's coding exercise is to remove duplicate characters from String in Java. For example, if the given String is "aaaaaa" then the output should be "a", because the rest of the "a" are duplicates. Similarly, if the input is "abcd" then the output should also be "abcd" because there is no duplicate character in this String. By the way, you can not use any third-party library or Java API method to solve this problem, you need to develop your own logic or algorithm and then write code to implement that algorithm. This is one of the interesting problems from Java coding interviews and you can use this program to weed out Java programmers who cannot write code.
This problem is much better than Fizzbuzz because it requires more logic and coding than solving Fizzbuzz.
In this article, I'll share two ways to remove duplicate characters from String and will discuss the pros and cons, the time complexity of each solution.
This problem is much better than Fizzbuzz because it requires more logic and coding than solving Fizzbuzz.
In this article, I'll share two ways to remove duplicate characters from String and will discuss the pros and cons, the time complexity of each solution.
Sometimes, a choice is restricted by putting additional constraints put by the interviewer, that's why it's better to know multiple ways to solve a problem. This not only helps in understanding the problem better but also in comparative analysis.
Also, basic knowledge of essential data structure is also very important and that's why I suggest all Java programmers join comprehensive Data Structure and Algorithms courses like these Data structures and Algorithms courses to fill the gaps in your understanding.
Btw, This is a very popular coding-based question on tech companies' job interviews and you will this problem in all good coding interview questions books like Cracking the Coding Interview, which contains more than 189 code-based problems from various interviews.
Also, basic knowledge of essential data structure is also very important and that's why I suggest all Java programmers join comprehensive Data Structure and Algorithms courses like these Data structures and Algorithms courses to fill the gaps in your understanding.
Btw, This is a very popular coding-based question on tech companies' job interviews and you will this problem in all good coding interview questions books like Cracking the Coding Interview, which contains more than 189 code-based problems from various interviews.
Solution 1 - Replacing duplicate with NUL character
Our first solution is coded in the method removeDuplicates(String word), it takes a String and returns another String without duplicates. This algorithm goes through each character of String to check if it's a duplicate of an already found character.It skips the duplicate character by inserting 0, which is later used to filter those characters and update the non-duplicate characters.
The Time Complexity of this solution is O(n^2), excluded to uniqueString() method, which creates a String from the character array. This method will work even if the input string contains more than one duplicate character.
This algorithm is also an example of a brute force algorithm to solve a programming problem. To learn more about different types of algorithms in the programming world e.g. greedy algorithm, you should check Introduction to Algorithm by Thomas Cormen, one of the best books to learn computer algorithms.
The Time Complexity of this solution is O(n^2), excluded to uniqueString() method, which creates a String from the character array. This method will work even if the input string contains more than one duplicate character.
This algorithm is also an example of a brute force algorithm to solve a programming problem. To learn more about different types of algorithms in the programming world e.g. greedy algorithm, you should check Introduction to Algorithm by Thomas Cormen, one of the best books to learn computer algorithms.
Solution 2 - Using an ASCII table
Our second solution is coded in removeDuplicatesFromString(String input) method. This solution assumes that the given input String only contains ASCII characters. You should ask this question to your Interviewer, if he says ASCII then this solution is Ok.This Algorithm uses an additional memory of constant size. It creates a boolean array of length 256 to accommodate all ASCII characters. Initially, all array elements are false. This array is used as a Set to check if an ASCII character is seen already or not. You iterate over the character array and mark all characters which are seen by storing true in the corresponding index of the ASCII array.
For example, if input String is "abcda" then ASCII['a'] = true means at index 65 true is stored, next time when you find this character 'a', you know that this character has appeared before because of true element hence it's duplicate, so you store here ZERO or NUL.
Later you remove all those characters and create a unique String as output. See Unicode demystified to learn more about encoding schemes in the computer world.
That's all about how to remove duplicate characters from given String in Java. You can use this logic to remove duplicate values from the array in Java as well. If you need more practice, you can check out these coding interview courses. It contains more than 190 problems and their solutions from various tech companies programming job interviews.
Other coding problems for practicing in Java:
Java Solution - Remove duplicate characters from given String
Here is my solution for the problem of removing repeated or duplicate characters from given String in the Java programming language. If you understand the logic you can write this solution in any programming language like C, C++, C#, Python, or JavaScript./** * Java Program to remove duplicate characters from String. * * @author Javin Paul */ public class RemoveDuplicateCharacters{ public static void main(String args[]) { System.out.println("Call removeDuplicates(String word) method ...."); String[] testdata = {"aabscs", "abcd", "aaaa", null, "", "aaabbb", "abababa"}; for (String input : testdata) { System.out.printf("Input : %s Output: %s %n", input, removeDuplicates(input)); } System.out.println("Calling removeDuplicatesFromString(String str)."); for (String input : testdata) { System.out.printf("Input : %s Output: %s %n", input, removeDuplicatesFromString(input)); } } /* * This algorithm goes through each character of String to check if its * a duplicate of already found character. It skip the duplicate * character by inserting 0, which is later used to filter those * characters and update the non-duplicate character. * Time Complexity of this solution is O(n^2), excluded to * UniqueString() method, which creates String from character array. * This method will work even if String contains more than one duplicate * character. */ public static String removeDuplicates(String word) { if (word == null || word.length() < 2) { return word; } int pos = 1; // possible position of duplicate character char[] characters = word.toCharArray(); for (int i = 1; i < word.length(); i++) { int j; for (j = 0; j < pos; ++j) { if (characters[i] == characters[j]) { break; } } if (j == pos) { characters[pos] = characters[i]; ++pos; } else { characters[pos] = 0; ++pos; } } return toUniqueString(characters); } /* * This solution assumes that given input String only contains * ASCII characters. You should ask this question to your Interviewer, * if he says ASCII then this solution is Ok. This Algorithm * uses additional memory of constant size. */ public static String removeDuplicatesFromString(String input) { if (input == null || input.length() < 2) { return input; } boolean[] ASCII = new boolean[256]; char[] characters = input.toCharArray(); ASCII[input.charAt(0)] = true; int dupIndex = 1; for (int i = 1; i < input.length(); i++) { if (!ASCII[input.charAt(i)]) { characters[dupIndex] = characters[i]; ++dupIndex; ASCII[characters[i]] = true; } else { characters[dupIndex] = 0; ++dupIndex; } } return toUniqueString(characters); } /* * Utility method to convert Character array to String, omitting * NUL character, ASCII value 0. */ public static String toUniqueString(char[] letters) { StringBuilder sb = new StringBuilder(letters.length); for (char c : letters) { if (c != 0) { sb.append(c); } } return sb.toString(); } } Output Call removeDuplicates(String word) method .... Input : aabscs Output: absc Input : abcd Output: abcd Input : aaaa Output: a Input : null Output: null Input : Output: Input : aaabbb Output: ab Input : abababa Output: ab Calling removeDuplicatesFromString(String str) method .... Input : aabscs Output: absc Input : abcd Output: abcd Input : aaaa Output: a Input : null Output: null Input : Output: Input : aaabbb Output: ab Input : abababa Output: ab
That's all about how to remove duplicate characters from given String in Java. You can use this logic to remove duplicate values from the array in Java as well. If you need more practice, you can check out these coding interview courses. It contains more than 190 problems and their solutions from various tech companies programming job interviews.
Other coding problems for practicing in Java:
- How to reverse a String in place in Java? (solution)
- Top 20 Amazon and Google Interview Questions? (list)
- How to reverse an ArrayList in place in Java? (solution)
- How to print the Fibonacci series without recursion? (solution)
- How to find duplicate words in Java String? (solution)
- How do you print prime numbers up to a given number? (solution)
- How to check if a String is a Palindrome in Java? (solution)
- How to find duplicate elements in an array? (solution)
- Write a program to print Floyd's triangle in Java? (solution)
- How to find the largest prime factor of a number in Java? (solution)
- How to count the number of words in a given String? (solution)
- Write a Java program to print Pascal's triangle? (solution)
- How to find all permutations of a String in Java? (solution)
- Write a Java program to print the pyramid pattern of stars? (solution)
And lastly one question for you? How do you find if String contains only unique characters and no duplicates? This one is also a popular interview question and If you know let me know your approach and code in comments, all the best !!.
What is the time complexity of second solution using ASCII table? I guess its still O(n) right because you need to check for n characters in String and each check is O(1) because of array access? am I right? Any further way to improve this algorithm to find duplicate character let's say in log(N) or constant time?
ReplyDeletePlease put the code samples in a github repo so that we can try them easily.
ReplyDeleteCurrently if I copy paste the code samples,there is no formatting at all and there is an embedded link everywhere "Read more: http://javarevisited.blogspot.com/2016/06/how-to-remove-duplicate-characters-from-String-Java.html#ixzz4B9fdcSug"
Variable pos is an extra variable.It's not required.
ReplyDeleteHere is the removeDuplicates method without pos variable.
public static String removeDuplicates(String word) {
if (word == null || word.length() < 2) {
return word;
}
int pos = 1; // possible position of duplicate character
char[] characters = word.toCharArray();
for (int i = 1; i < word.length(); i++) {
int j;
for (j = 0; j < i; ++j) {
if (characters[i] == characters[j]) {
break;
}
}
if (j == i) {
continue;
} else {
characters[i] = 0;
// ++pos;
}
}
return toUniqueString(characters);
}
In removeDuplicates method, pos variable is not required
ReplyDeletepublic static String removeDuplicates(String word) {
if (word == null || word.length() < 2) {
return word;
}
char[] characters = word.toCharArray();
for (int i = 1; i < word.length(); i++) {
int j;
for (j = 0; j < i; ++j) {
if (characters[i] == characters[j]) {
break;
}
}
if (j != i) {
characters[i] = 0;
}
}
return toUniqueString(characters);
}
Hello @Deepak and @Ashish, good catch. Thanks
ReplyDeleteWhy do we need to do an extra traversal just to remove the null. What if we have 1M characters in the input string. We can avoid this extra traversal by building the string on the fly, i.e in the removeDuplicatesFromString method itself.
ReplyDeleteCan't we code like that:
ReplyDeletepublic static String removeDuplicates(String word) {
if (word == null || word.length() < 2) {
return word;
}
char[] characters = word.toCharArray();
for (int i = 1; i < word.length(); i++) {
int j;
for (j = 0; j < i; ++j) {
if (characters[i] == characters[j]) {
character[j] == 0;
}
}
}
return toUniqueString(characters);
}
Simpler (in my opinion):
ReplyDeletepublic String removeDuplicates(String literal) {
if (literal == null || literal.length() < 2) {
return literal;
}
StringBuilder sb = new StringBuilder(literal.length());
char[] characters = literal.toCharArray();
sb.append(Character.toString(characters[0]));
for(int i = 1; i < characters.length ; ++i) {
String strchar = Character.toString(characters[i]);
if (sb.indexOf(strchar) < 0) {
sb.append(strchar);
}
}
return sb.toString();
}
Even simpler with Java 8 (in my opinion):
ReplyDeletepublic String removeDuplicatesJ8(String word) {
if (word == null || word.length() < 2) {
return word;
}
StringBuilder sb = new StringBuilder(word.length());
word.chars().distinct()
.mapToObj(ch -> (char)ch).forEach((ch) -> sb.append(ch));
return sb.toString();
}
@Triki, indeed the Java 8 solution is very simple with clever use of distinct() you can remove duplicate character from a character stream. Thanks
ReplyDeletei like more general option, if we don't have limit of space complexity we can use some structure, i like for this problems HashMaps. my solution:
ReplyDeletepublic static String removeDuplicates(String word) {
if (word == null || word.length() <= 1) {
return word;
}
Set unduplicate = new HashSet<>(word.length());
for (int i = 0; i < word.length(); i++) {
unduplicate.add(word.charAt(i));
}
StringBuilder ret = new StringBuilder(unduplicate.size());
for (Character c : unduplicate) {
ret.append(c);
}
return ret.toString();
}
@Triki if you see the documentation of distinct for that implementation can see that:
ReplyDelete// While functional and quick to implement, this approach is not very efficient.
// An efficient version requires an int-specific map/set implementation.
Also i realy don't know if that will be more efficient, but yes is more simple ^^.
Gretings.
why not just add the items in a linkedhashmap/set then turn that into an array? the multiples arent stored and it keeps order
ReplyDelete@Anonymous, look my solution is the same like you said.
ReplyDeleteOk, MyserY, more efficient:
ReplyDeletepublic String removeDuplicatesV2(String word) {
if (word == null || word.length() < 2) {
return word;
}
StringBuilder sb = new StringBuilder(word.length());
sb.append(word.charAt(0));
for(int i = 1; i < word.length() ; ++i) {
String strchar = Character.toString(word.charAt(i));
if (sb.indexOf(strchar) < 0) {
sb.append(strchar);
}
}
return sb.toString();
}
@Triki, that is not more efficient. if you see the implementation of indexOf:
ReplyDeletepublic int indexOf(String str, int fromIndex) {
return String.indexOf(value, 0, count,
str.toCharArray(), 0, str.length(), fromIndex);
}
you will see it is an Iterative solution, is O(n*n).
When you check in Set you will use a Hash and solution is O(1) for check if character exist.
Gretings.
@MyserY, theoretically, but for a large enough "n" time of "removeDuplicatesV2" are better ...
ReplyDeleteTherefore, the complexity analysis is not complete. Perhaps the code-statements used are not atomic ...
i dont understand the loop conditions..
ReplyDeletewhy not this :
char[] array = input.toCharArray();
for(int i=0; i<input.length();i++){
for(int j=i+1;j<input.length();j++){
if(array[i] == array[j]){
array[j] = 0;
}
}
}
mayank, remove != replace with zero
ReplyDeletewrite a program to remove duplicate string from a sentence without using any predefined method.
ReplyDeleteExample : input :: abc hello is my abc is xyz
output:: abc hello is my xyz
Example 1:
ReplyDeleteinput1: 153 input2: jaya
Output: “jy1”
153 is armstrong number
remove duplicates occurring more than once (a will be removed)
diff in digits: 5-3-1 = 1
result: jy1
Example 2:
input1: 111 input2: bhanuchandra
Output: “bucdr3”
111 is not an armstrong number
remove duplicates occuring more than once (a, h, n will be removed)
sum of digits: 1+1+1 = 3
result: bucdr3