import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Example {
public static void main(String[] args) {
final String regex = "<!--(?:(?!<!--|-->).|(?R))*-->";
final String string = "Remove comments from some generated HTML which can be invalid with nested comments\n"
+ "==================================================================================\n\n"
+ "I would like to remove HTML comments from some generated content.\n"
+ "If I use the regex `/<!--(.*?)-->/` (ungreedy with the `?`) then it works for most cases such as this example:\n\n"
+ "```html\n"
+ "<!-- <h1> test </h1> --> not remove <!-- <h1> test 2 </h1> -->\n"
+ "```\n\n"
+ "It gets rid of the `<h1>` tags and leaves the \"*not remove*\" as desired.\n\n"
+ "But **if the comments are nested**, then it will not handle it properly as it will leave the last comment closing tag `'-->'`. The workaround would be to use a greedy pattern, but in this case it will not work for the first case, with multiple comments.\n\n"
+ "Example of nested comments (I know it's not valid HTML, but it's the backend which is generating it):\n\n"
+ "```html\n"
+ "text <!-- something <!-- <p> test </p> --> need remove -->\n"
+ "```\n\n"
+ "I've tried to find a solution, but I don't know how to solve this. Has anyone an idea how to handle it?\n\n"
+ "<!-- multiline\n"
+ "comment -->\n\n"
+ "<!-- <footer>Footer with\n"
+ " nested <!-- comment -->\n"
+ " on several lines.</footer>\n"
+ "-->";
final Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for Java, please visit: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html