package main
import (
"regexp"
"fmt"
)
func main() {
var re = regexp.MustCompile(`(?m)((\r\n|\r|\n){2,}|\p{Zp}+|\A)^\s*\S`)
var str = `This is a paragraph that starts at the beginning of the file. It does not have any internal line breaks.
This is the second paragraph. It is preceded by two line feeds.
It also has an internal line feed, which does not count as an additional paragraph in this pattern.
This is the third paragraph, and is preceded by several line feeds and some spaces between some of those line feeds, all of which only counts as one paragraph separator in this pattern.
This is the fourth paragraph.
It begins and ends with other whitespace, as well as has an internal line feed and extra internal whitespace, and counts as one paragraph.
!!!!This is the fifth paragraph. It begins with punctuation.
@#$%^ This is the sixth paragraph. It begins with whitespace followed by punctuation.
This is the seventh paragraph. It is preceded by a tab character. This paragraph also contains the following note: This editor does not support the unicode paragraph separator character (UTF-16 U+2029 or UTF-8 0xE280A9) and, as such, does not demonstrate matching based on that character. You can test it yourself in a C# string by using the unicode literal escape sequence \u2029 or in a UTF-8 byte span using the UTF-8 sequence 0xE2 0x80 0xA9. In windows, some applications will accept it typed as alt+2029. Web browsers, the windows console, windows terminal, and windows clipboard also do not understand it. However, you aren't likely to encounter it or other unicode paragraph separator class characters in most situations, anyway, so you can remove \`|\p{Zp}+\` from the pattern if you only want to include carriage return and line feed sequences as line breaks.
π¨π½βπ»This is the final paragraph. It begins with an emoji that lies outside the Unicode Basic Multilingual Plane. That emoji requires 4 UTF-16 codepoints, which clock in at 7 C# chars meaning "π¨π½βπ»".Length == 7, even though it is just one printable glyph. Those are: man, medium skin tone, zero width joiner, and laptop (U+1F468 U+1F3FD U+200D U+1F4BB). Three of those are surrogate pairs. This value is especially interesting in comparison with plain scans for ASCII carriage returns, as the U+200D codepoint contains the byte 0D, which is a carriage return if you scan for literal bytes without using the proper encoding. This paragraph also demonstrates that the line feeds following this paragraph do not cause an additional paragraph to be counted, as there are no more non-whitespace characters after this paragraph.
`
for i, match := range re.FindAllString(str, -1) {
fmt.Println(match, "found at index", i)
}
}
Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for Golang, please visit: https://golang.org/pkg/regexp/