From: 011netservice@gmail.com Date: 2022-04-24 Subject: RegularExpression.txt Regular expression 是典型的(不常用就會忘光光)的技術. 就算很簡單的語法, 只要太久沒用, 就會忘記正確的語法. 尤其是不同符號代表的細節, 例如: ^*.+/\$..., 看起來都像是罵人的符號, 但是都有不同的定義. 而且通常需要使用的時候, 也只是想用"一個用法"而已, 卻又要重新複習一遍, 才能找到正確的用法. 所以這裡紀錄一些常用的用法, 希望需要的時候, 可以剛好夠用就好, 節省時間! 需要測試的時候, 可以利用本文字檔案(包含測試字串), 拿到 Notepad++ 的 Find 查查看. 此外, 在JS 中, 可以用 // 替代 new RegExp, 例如: var re = new RegExp('^09\d{8}$') // => /^09d{8}$/ var re = new RegExp('^09\\d{8}$') // => /^09\d{8}$/ ---------- 20210320 Notepad++ Samples: **** 常用: 在JS 中, 可以用 // 替代 new RegExp, 例如: var re = new RegExp('^09\d{8}$') // => /^09d{8}$/ var re = new RegExp('^09\\d{8}$') // => /^09\d{8}$/ **** 常用: (依序 字串A 及 字串B) 字串A.*字串B 有效.*13 有效:.*000071 INFO\| 有效:.*000071 INFO\| 有效:.*9F \b字串A\b.*\b字串B\b \b有效\b.*\bP\b ^.+7(.+)3 ^.+57(.+)13 \b57\b.*\b13\b **** 常用: (依序 字串A 及 字串B 及 字串C) 字串A.*字串B.*字串c 有效.*57.*13 INFO\| 有效:.*000071.*12F **** 常用: (字串A 或 字串B) 字串A|字串B **** 常用: 依序3個字串: 有效, 000057, 00013 \b有效\b.*\b000057\b.*\b00013\b **** 常用: 找到("@"間隔 且 結尾為".")的字串, 例如 email, 網址等 ^.+@(.+)\. --> 找到("@"間隔 且 結尾為".")的字串. ^.+@.+\. --> 同上看不出差異 **** 常用的確認後放到上面 **** 常用查詢: 每列開始為 Start, 結尾為 deed, 中間含有 kind 或 good 的字串. ^Start (?=.*kind)(?=.*good).* deed$ **** 常用查詢: 字串A or 字串B or 字串C (?=.*word1)(?=.*word2)(?=.*word3) ^[0-9]*[1-9][0-9]*$ 正整數 **** 如何查詢不分序字串1 and 字串2 ? ----> 這需求好像就本身就有問題 ? ---------- 20210320 Cheat Sheet ref: https://www.regextester.com/15 https://blog.techbridge.cc/2020/05/14/introduction-to-regular-expression/ 在JS 中, 可以用 // 替代 new RegExp, 例如: var re = new RegExp('^09\d{8}$') // => /^09d{8}$/ var re = new RegExp('^09\\d{8}$') // => /^09\d{8}$/ Character classes . any character except newline, 任意字元除了分行符號 \w \d \s word, digit, whitespace \W \D \S not word, digit, whitespace [abc] any of a, b, or c [^abc] not a, b, or c [a-g] character between a & g Anchors ^abc$ start / end of the string \b word boundary Escaped characters \. \* \\ escaped special characters \t \n \r tab, linefeed, carriage return \u00A9 unicode escaped © Groups & Lookaround (abc) capture group, 取得符合的字串. \1 backreference to group #1 (?:abc) non-capturing group (?=abc) positive lookahead (?!abc) negative lookahead Quantifiers & Alternation a* a+ a? 0 or more, 1 or more, 0 or 1, *為?, +為至少1個, ?為? a{5} a{2,} exactly five, two or more, 重複5個a, 至少2個a a{1,3} between one & three a+? a{2,}? match as few as possible ab|cd match ab or cd 其他 a.b a跟b 之間隔1個字. /xyz/i xyz 忽略大小寫 ^.+@(.+)\. --> 找到("@"間隔 且 結尾為".")的字串. ^.+@.+\. --> 同上看不出差異 /^A(\d+)Z$/ /^A\d+Z$/ ---------- Test values: aaa@gmail.com ccc@gmail.com ddd.yahoo.com.tw eee@msn.com fff@ptt.com Line 134: 2021-03-11 14:38:41.7352|INFO| 有效: T=5s, (01-000071, S=0), R=(#00001, 00101, 96), B=(00018, 74, 13F, A=a500, 500Only=False), P=(12F, -666=101274-101940), G=(23.775398, 120.191780, H=0, V=0). Line 142: 2021-03-11 14:38:41.7508|INFO| 有效: T=16ms, (01-000071, S=0), R=(#00002, 00102, 121), B=(00018, 74, 13F, A=a500, 500Only=False), P=(12F, -666=101274-101940), G=(23.775398, 120.191780, H=0, V=0). Line 150: 2021-03-11 14:38:41.7508|INFO| 有效: T=16ms, (01-000071, S=0), R=(#00003, 00103, 126), B=(00018, 74, 13F, A=a500, 500Only=False), P=(12F, -666=101274-101940), G=(23.775398, 120.191780, H=0, V=0). Line 464: 2021-03-11 14:38:47.3373|INFO| 有效: T=6s, (01-000071, S=0), R=(#00001, 00101, 95), B=(00018, 70, 13F, A=a500, 500Only=False), P=(12F, -665=101275-101940), G=(23.775406, 120.191780, H=0, V=0). Line 472: 2021-03-11 14:38:47.3373|INFO| 有效: T=16ms, (01-000071, S=0), R=(#00002, 00102, 122), B=(00018, 70, 13F, A=a500, 500Only=False), P=(12F, -665=101275-101940), G=(23.775406, 120.191780, H=0, V=0). Line 791: 2021-03-11 14:38:52.9511|INFO| 有效: T=6s, (01-000071, S=0), R=(#00001, 00101, 94), B=(00018, 74, 13F, A=a500, 500Only=False), P=(12F, -668=101272-101940), G=(23.775410, 120.191780, H=0, V=0). Line 1114: 2021-03-11 14:38:58.5812|INFO| 有效: T=6s, (01-000071, S=0), R=(#00002, 00102, 90), B=(00018, 61, 13F, A=a500, 500Only=False), P=(12F, -677=101263-101940), G=(23.775415, 120.191780, H=0, V=0). Line 1122: 2021-03-11 14:38:58.5969|INFO| 有效: T=-15ms, (01-000071, S=0), R=(#00001, 00101, 61), B=(00018, 61, 13F, A=a500, 500Only=False), P=(12F, -677=101263-101940), G=(23.775415, 120.191780, H=0, V=0). Line 1138: 2021-03-11 14:38:59.5029|INFO| 有效: T=922ms, (01-000071, S=0), R=(#00003, 00103, 95), B=(00018, 61, 13F, A=a500, 500Only=False), P=(12F, -677=101263-101940), G=(23.775415, 120.191780, H=0, V=0). Line 1471: 2021-03-11 14:39:04.1958|INFO| 有效: T=5s, (01-000071, S=0), R=(#00001, 00101, 66), B=(00018, 60, 13F, A=a500, 500Only=False), P=(13F, -688=101252-101940), G=(23.775410, 120.191796, H=0, V=0). Line 1479: 2021-03-11 14:39:04.1958|INFO| 有效: T=17ms, (01-000071, S=0), R=(#00002, 00102, 84), B=(00018, 60, 13F, A=a500, 500Only=False), P=(13F, -688=101252-101940), G=(23.775410, 120.191796, H=0, V=0). Line 1487: 2021-03-11 14:39:04.2170|INFO| 有效: T=4ms, (01-000071, S=0), R=(#00003, 00103, 93), B=(00018, 60, 13F, A=a500, 500Only=False), P=(13F, -688=101252-101940), G=(23.775410, 120.191796, H=0, V=0). ---------- 20200602 ref: Regular expressions quick reference.pdf ---------- 20181109 每一個LINE用戶帳號都有一個專屬的內部識別碼,稱為User ID。 User ID與LINE用戶自訂的LINE ID的格式與用途完全不同。 開發Messaging API應用程式時,無論是接受訊息、傳送訊息、或是存取其他API,皆必須使用User ID來代表LINE用戶。 User ID的格式為33個字元的英數字字串,例如U206d25c2ea6bd87c17655609a1c37cb8。 如果開發者想要驗證一個字串是否為正確的User ID格式,可以使用正規表示式(Regular Expression)「^U[0-9a-f]{32}$」來測試。 ---------- 20181109 Regular Expressions ref: todo: https://dotblogs.com.tw/johnny/archive/2010/01/25/13301.aspx ---------- The regular expression \p{P}*\s+ matches zero, one, or more punctuation characters followed by one or more white-space characters. It assumes that the total number of matches equals the approximate word count. string pattern = @"\p{P}*\s+"; // Number of words. int nWords = 0; nWords = Regex.Matches(input, pattern).Count; ---------- 20201223 以下為 ZRegularExpression.cs: /* ZRegularExpression.cs 20201122, Honda, Update for vs2019 v16. Samples: 查詢依序存在的三個字串: 有效, 000057, 00013 20201211, Notepad++ 測試 OK. \b有效\b.*\b000057\b.*\b00013\b */ using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; // add using System.Text.RegularExpressions; namespace ZLib { public static class ZRegularExpression { /// /// 判斷輸入的字符串是否是一個合法的Email地址 /// /// /// public static bool ZRegIsEmail(string input) { string pattern = @"^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$"; Regex regex = new Regex(pattern); return regex.IsMatch(input); } /// /// 判斷輸入的字符串字包含英文字母 /// /// /// public static bool ZRegIsEnglisCh(string input) { Regex regex = new Regex("^[A-Za-z]+$"); return regex.IsMatch(input); } /// /// 判斷輸入的字符串是否是表示一個IP地址 /// /// 被比較的字符串 /// 是IP地址則為True public static bool ZIsIPv4(string input) { string[] IPs = input.Split('.'); Regex regex = new Regex(@"^\d+$"); for (int i = 0; i < IPs.Length; i++) { if (!regex.IsMatch(IPs[i])) { return false; } if (Convert.ToUInt16(IPs[i]) > 255) { return false; } } return true; } /// /// 判斷輸入的字符串是否是合法的IPV6 地址 /// /// /// /* ******************************************************************* * 1、通過「:」來分割字符串看得到的字符串數組長度是否小於等於8 * 2、判斷輸入的IPV6字符串中是否有「::」。 * 3、如果沒有「::」採用 ^([\da-f]{1,4}:){7}[\da-f]{1,4}$ 來判斷 * 4、如果有「::」 ,判斷"::"是否止出現一次 * 5、如果出現一次以上 返回false * 6、^([\da-f]{1,4}:){0,5}::([\da-f]{1,4}:){0,5}[\da-f]{1,4}$ * ******************************************************************/ public static bool ZIsIPV6(string input) { string pattern; string temp = input; string[] strs = temp.Split(':'); if (strs.Length > 8) { return false; } //int count = MetarnetRegex.GetStringCount(input, "::"); int count = 0; if (count > 1) { return false; } else if (count == 0) { pattern = @"^([\da-f]{1,4}:){7}[\da-f]{1,4}$"; Regex regex = new Regex(pattern); return regex.IsMatch(input); } else { pattern = @"^([\da-f]{1,4}:){0,5}::([\da-f]{1,4}:){0,5}[\da-f]{1,4}$"; Regex regex1 = new Regex(pattern); return regex1.IsMatch(input); } } /// /// 調用Regex中IsMatch函數實現一般的正則表達式匹配 /// /// 要搜索匹配項的字符串 /// 要匹配的正則表達式模式。 /// 如果正則表達式找到匹配項,則為 true;否則,為 false。 public static bool ZRegIsMatch(string input, string pattern) { Regex regex = new Regex(pattern); return regex.IsMatch(input); } /// /// 判斷輸入的字符串是否是一個合法的手機號 /// /// /// public static bool ZRegIsMobilePhone(string input) { Regex regex = new Regex("^13\\d{9}$"); return regex.IsMatch(input); } /// /// 匹配非負整數 /// /// /// /// public static bool ZRegIsNotNagtive(string input) { Regex regex = new Regex(@"^\d+$"); return regex.IsMatch(input); } /// /// 判斷輸入的字符串是否只包含數字和英文字母 /// /// /// public static bool ZRegIsNumAndEnCh(string input) { string pattern = @"^[A-Za-z0-9]+$"; Regex regex = new Regex(pattern); return regex.IsMatch(input); } /// /// 判斷輸入的字符串只包含數字 /// 可以匹配整數和浮點數 /// ^-?\d+$|^(-?\d+)(\.\d+)?$ /// /// /// public static bool ZRegIsNumber(string input) { string pattern = "^-?\\d+$|^(-?\\d+)(\\.\\d+)?$"; Regex regex = new Regex(pattern); return regex.IsMatch(input); } /// /// RegularExpressions 助憶函數 /// 至少有一個數字 /// 至少有一個小寫英文字母 /// 至少有一個大寫英文字母 /// 字串長度在 6 ~ 30 個字母之間 /// /// /// public static bool ZRegIsPasswordUp6With1A(string s1) { Regex rx1 = new Regex(@"^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,30}$"); return rx1.IsMatch(s1); } /// /// RegularExpressions 助憶函數 /// 至少有一個數字 /// 至少有一個大寫或小寫英文字母 /// 至少有一個特殊符號 /// 字串長度在 6 ~ 30 個字母之間 /// /// /// public static bool ZRegIsPasswordUp6With1S(string s1) { Regex rx1 = new Regex(@"^(?=.*\d)(?=.*[a-zA-Z])(?=.*\W).{6,30}$"); return rx1.IsMatch(s1); } /// /// 匹配3位或4位區號的電話號碼,其中區號可以用小括號括起來, /// 也可以不用,區號與本地號間可以用連字號或空格間隔, /// 也可以沒有間隔 /// \(0\d{2}\)[- ]?\d{8}|0\d{2}[- ]?\d{8}|\(0\d{3}\)[- ]?\d{7}|0\d{3}[- ]?\d{7} /// /// /// public static bool ZRegIsPhone(string input) { //string pattern = "^\\(0\\d{2}\\)[- ]?\\d{8}$|^0\\d{2}[- ]?\\d{8}$|^\\(0\\d{3}\\)[- ]?\\d{7}$|^0\\d{3}[- ]?\\d{7}$"; string pattern = "^\\(0\\d{2}\\)[- ]?\\d{8}$|^0\\d{2}[- ]?\\d{8}$|^\\(0\\d{3}\\)[- ]?\\d{6}$|^0\\d{3}[- ]?\\d{6}$"; Regex regex = new Regex(pattern); return regex.IsMatch(input); } public static string ZRegIsTaiwanPhone(string sInput) { //try //{ string sPattern = @"^\(?(\d{2})\)?[\s\-]?(\d{4})\-?(\d{4})$"; //string sPattern = "^\\(0\\d{2}\\)[- ]?\\d{8}$|^0\\d{2}[- ]?\\d{8}$|^\\(0\\d{3}\\)[- ]?\\d{6}$|^0\\d{3}[- ]?\\d{6}$"; Regex r1 = new Regex(sPattern); string s1 = sInput; Match m1 = r1.Match(s1); string s2 = string.Format("({0}) {1}-{2}", m1.Groups[1], m1.Groups[2], m1.Groups[3]); return s2; //} //catch (Exception ex) //{ // mError.Set(ex); // return string.Empty; //} } /// /// RegularExpressions 助憶函數 /// xxxx-xxxxxx /// /// /// public static bool ZRegIsPhone_10(string s1) { Regex rx1 = new Regex(@"\b\d{4}-\d{6}"); return rx1.IsMatch(s1); } /// /// RegularExpressions 助憶函數 /// xxxx-xxxx /// /// /// public static bool ZRegIsPhone_8(string s1) { Regex rx1 = new Regex(@"\b\d\d\d\d-\d\d\d\d"); return rx1.IsMatch(s1); } /// /// 匹配正整數 /// /// /// public static bool ZRegIsUint(string input) { Regex regex = new Regex("^[0-9]*[1-9][0-9]*$"); return regex.IsMatch(input); } /// /// 判斷輸入的字符串是否是一個超鏈接 /// /// /// public static bool ZRegIsURL(string input) { //string pattern = @"http://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?"; string pattern = @"^[a-zA-Z]+://(\w+(-\w+)*)(\.(\w+(-\w+)*))*(\?\S*)?$"; Regex regex = new Regex(pattern); return regex.IsMatch(input); } public static bool ZLike(string item, string searchPattern) { return ZREGGetRegex("^" + searchPattern).IsMatch(item); } /// /// 從輸入字符串中的第一個字符開始,用替換字符串替換指定的正則表達式模式的所有匹配項。 /// /// 輸入字符串 /// 模式字符串 /// 用於替換的字符串 /// 返回被替換後的結果 public static string ZRegReplace(string input, string pattern, string replacement) { Regex regex = new Regex(pattern); return regex.Replace(input, replacement); } public static string ZRegSearch(string item, string searchPattern) { var match = ZREGGetRegex(searchPattern).Match(item); if (match.Success) { return item.Substring(match.Index, match.Length); } return null; } /// /// 在由正則表達式模式定義的位置拆分輸入字符串。 /// /// 輸入字符串 /// 模式字符串 /// public static string[] ZRegSplit(string input, string pattern) { Regex regex = new Regex(pattern); return regex.Split(input); } public static List ZRegExtract(string item, string searchPattern) { var result = ZRegSearch(item, searchPattern); //if (!string.IsNullOrWhiteSpace(result)) .net 4.0才支援 if (!string.IsNullOrEmpty(result)) { var splitted = searchPattern.Split(new[] { '?', '%', '*', '#' }, StringSplitOptions.RemoveEmptyEntries); var temp = result; var final = new List(); // .net 4.0才支援 //splitted.ForEach(x => //{ // var pos = temp.IndexOf(x); // if (pos > 0) // { // final.Add(temp.Substring(0, pos)); // temp = temp.Substring(pos); // } // temp = temp.Substring(x.Length); //}); foreach (string s1 in splitted) { var pos = temp.IndexOf(s1); if (pos > 0) { final.Add(temp.Substring(0, pos)); temp = temp.Substring(pos); } temp = temp.Substring(s1.Length); } if (temp.Length > 0) final.Add(temp); return final; } return null; } /// /// 計算字符串的字符長度,一個漢字字符將被計算為兩個字符 /// /// 需要計算的字符串 /// 返回字符串的長度 public static int ZRegGetCount(string input) { return Regex.Replace(input, @"[\u4e00-\u9fa5/g]", "aa").Length; } /// /// 判斷字符串compare 在 input字符串中出現的次數 /// /// 源字符串 /// 用於比較的字符串 /// 字符串compare 在 input字符串中出現的次數 private static int ZRegGetStringCount(string input, string compare) { int index = input.IndexOf(compare); if (index != -1) { //return 1 + RegGetStringCount(input.Substring(index + compare.Length), compare); //return 1 + input.Substring(index + compare.Length).ZRegGetStringCount(compare); return 1 + ZRegGetStringCount(input.Substring(index + compare.Length), compare); } else { return 0; } } /// /// 以字串取得Regex物件. 字串中的保留字會自動置換. /// /// /// static Regex ZREGGetRegex(string searchPattern) { return new Regex(searchPattern .Replace("\\", "\\\\") .Replace(".", "\\.") .Replace("{", "\\{") .Replace("}", "\\}") .Replace("[", "\\[") .Replace("]", "\\]") .Replace("+", "\\+") .Replace("$", "\\$") .Replace(" ", "\\s") .Replace("#", "[0-9]") .Replace("?", ".") .Replace("*", "\\w*") .Replace("%", ".*") , RegexOptions.IgnoreCase); } public static string ZMatchValueAfter(string sAll, string sPattern) { // example return "13" //StringBuilder sb1 = new StringBuilder(); //sb1.Append("GET /chat HTTP/1.1\r\n"); //sb1.Append("Host: localhost:80\r\n"); //sb1.Append("Upgrade: websocket\r\n"); //sb1.Append("Connection: Upgrade\r\n"); //sb1.Append("Origin: http://localhost:80\r\n"); //sb1.Append("Sec-WebSocket-Key: 930vdInchBqkasHhQh6aIQ==\r\n"); //sb1.Append("Sec-WebSocket-Version: 13\r\n"); //sb1.Append("\r\n"); //sAll = sb1.ToString(); //sPattern = "Sec-WebSocket-Key: (.*)"; Regex Regex1 = new Regex(sPattern, RegexOptions.IgnoreCase); GroupCollection c1 = Regex1.Match(sAll).Groups; //if (c1.Count < 1) // return null; return c1[1].Value.Trim(); } public static string ZGetMatchValueExample() { // example: StringBuilder sb1 = new StringBuilder(); sb1.Append("GET /chat HTTP/1.1\r\n"); sb1.Append("Host: localhost:80\r\n"); sb1.Append("Upgrade: websocket\r\n"); sb1.Append("Connection: Upgrade\r\n"); sb1.Append("Origin: http://localhost:80\r\n"); sb1.Append("Sec-WebSocket-Key: 930vdInchBqkasHhQh6aIQ==\r\n"); sb1.Append("Sec-WebSocket-Version: 13\r\n"); sb1.Append("\r\n"); string sHead = sb1.ToString(); //Regex webSocketKeyRegex = new Regex("Sec-WebSocket-Key: (.*)"); Regex webSocketVersionRegex = new Regex("Sec-WebSocket-Version: (.*)"); // check the version. Support version 13 and above const int WebSocketVersion = 13; int secWebSocketVersion = Convert.ToInt32(webSocketVersionRegex.Match(sHead).Groups[1].Value.Trim()); if (secWebSocketVersion < WebSocketVersion) { throw new Exception(string.Format("WebSocket Version {0} not suported. Must be {1} or above", secWebSocketVersion, WebSocketVersion)); } //string secWebSocketKey = webSocketKeyRegex.Match(sHead).Groups[1].Value.Trim(); //string setWebSocketAccept = base.ComputeSocketAcceptString(secWebSocketKey); //string response = ("HTTP/1.1 101 Switching Protocols\r\n" // + "Connection: Upgrade\r\n" // + "Upgrade: websocket\r\n" // + "Sec-WebSocket-Accept: " + setWebSocketAccept); // Sec-WebSocket-Key: 930vdInchBqkasHhQh6aIQ== //string regexPattern = "Sec-WebSocket-Accept: (.*)"; //sPattern = regexPattern; //Regex regex1 = new Regex(regexPattern); // Group[0].value = Sec-WebSocket-Accept // Group[1].value = (.*) for all matched. //string sMatch = regex1.Match(sAll).Groups[1].Value.Trim(); // 不分大小寫 string header = "Upgrade: websocket"; Regex webSocketUpgradeRegex = new Regex("Upgrade: websocket", RegexOptions.IgnoreCase); Match webSocketUpgradeRegexMatch = webSocketUpgradeRegex.Match(header); if (webSocketUpgradeRegexMatch.Success) { return "Success"; } else { return "Fail"; } //return sMatch; } } }