From: 011netservice@gmail.com
Date: 2022-04-24
Subject: RegularExpression.txt
Regular expression 是典型的(不常用就會忘光光)的技術.
就算很簡單的語法, 只要太久沒用, 就會忘記正確的語法.
尤其是不同符號代表的細節, 例如: ^*.+/\$..., 看起來都像是罵人的符號, 但是都有不同的定義.
而且通常需要使用的時候, 也只是想用"一個用法"而已, 卻又要重新複習一遍, 才能找到正確的用法.
所以這裡紀錄一些常用的用法, 希望需要的時候, 可以剛好夠用就好, 節省時間!
需要測試的時候, 可以利用本文字檔案(包含測試字串), 拿到 Notepad++ 的 Find 查查看.
此外, 在JS 中, 可以用 // 替代 new RegExp, 例如:
var re = new RegExp('^09\d{8}$') // => /^09d{8}$/
var re = new RegExp('^09\\d{8}$') // => /^09\d{8}$/
----------
20210320
Notepad++ Samples:
**** 常用: 在JS 中, 可以用 // 替代 new RegExp, 例如:
var re = new RegExp('^09\d{8}$') // => /^09d{8}$/
var re = new RegExp('^09\\d{8}$') // => /^09\d{8}$/
**** 常用: (依序 字串A 及 字串B)
字串A.*字串B
有效.*13
有效:.*000071
INFO\| 有效:.*000071
INFO\| 有效:.*9F
\b字串A\b.*\b字串B\b
\b有效\b.*\bP\b
^.+7(.+)3
^.+57(.+)13
\b57\b.*\b13\b
**** 常用: (依序 字串A 及 字串B 及 字串C)
字串A.*字串B.*字串c
有效.*57.*13
INFO\| 有效:.*000071.*12F
**** 常用: (字串A 或 字串B)
字串A|字串B
**** 常用: 依序3個字串: 有效, 000057, 00013
\b有效\b.*\b000057\b.*\b00013\b
**** 常用: 找到("@"間隔 且 結尾為".")的字串, 例如 email, 網址等
^.+@(.+)\. --> 找到("@"間隔 且 結尾為".")的字串.
^.+@.+\. --> 同上看不出差異
**** 常用的確認後放到上面
**** 常用查詢: 每列開始為 Start, 結尾為 deed, 中間含有 kind 或 good 的字串.
^Start (?=.*kind)(?=.*good).* deed$
**** 常用查詢: 字串A or 字串B or 字串C
(?=.*word1)(?=.*word2)(?=.*word3)
^[0-9]*[1-9][0-9]*$ 正整數
**** 如何查詢不分序字串1 and 字串2 ? ----> 這需求好像就本身就有問題 ?
----------
20210320
Cheat Sheet
ref:
https://www.regextester.com/15
https://blog.techbridge.cc/2020/05/14/introduction-to-regular-expression/
在JS 中, 可以用 // 替代 new RegExp, 例如:
var re = new RegExp('^09\d{8}$') // => /^09d{8}$/
var re = new RegExp('^09\\d{8}$') // => /^09\d{8}$/
Character classes
. any character except newline, 任意字元除了分行符號
\w \d \s word, digit, whitespace
\W \D \S not word, digit, whitespace
[abc] any of a, b, or c
[^abc] not a, b, or c
[a-g] character between a & g
Anchors
^abc$ start / end of the string
\b word boundary
Escaped characters
\. \* \\ escaped special characters
\t \n \r tab, linefeed, carriage return
\u00A9 unicode escaped ©
Groups & Lookaround
(abc) capture group, 取得符合的字串.
\1 backreference to group #1
(?:abc) non-capturing group
(?=abc) positive lookahead
(?!abc) negative lookahead
Quantifiers & Alternation
a* a+ a? 0 or more, 1 or more, 0 or 1, *為?, +為至少1個, ?為?
a{5} a{2,} exactly five, two or more, 重複5個a, 至少2個a
a{1,3} between one & three
a+? a{2,}? match as few as possible
ab|cd match ab or cd
其他
a.b a跟b 之間隔1個字.
/xyz/i xyz 忽略大小寫
^.+@(.+)\. --> 找到("@"間隔 且 結尾為".")的字串.
^.+@.+\. --> 同上看不出差異
/^A(\d+)Z$/
/^A\d+Z$/
----------
Test values:
aaa@gmail.com
ccc@gmail.com
ddd.yahoo.com.tw
eee@msn.com
fff@ptt.com
Line 134: 2021-03-11 14:38:41.7352|INFO| 有效: T=5s, (01-000071, S=0), R=(#00001, 00101, 96), B=(00018, 74, 13F, A=a500, 500Only=False), P=(12F, -666=101274-101940), G=(23.775398, 120.191780, H=0, V=0).
Line 142: 2021-03-11 14:38:41.7508|INFO| 有效: T=16ms, (01-000071, S=0), R=(#00002, 00102, 121), B=(00018, 74, 13F, A=a500, 500Only=False), P=(12F, -666=101274-101940), G=(23.775398, 120.191780, H=0, V=0).
Line 150: 2021-03-11 14:38:41.7508|INFO| 有效: T=16ms, (01-000071, S=0), R=(#00003, 00103, 126), B=(00018, 74, 13F, A=a500, 500Only=False), P=(12F, -666=101274-101940), G=(23.775398, 120.191780, H=0, V=0).
Line 464: 2021-03-11 14:38:47.3373|INFO| 有效: T=6s, (01-000071, S=0), R=(#00001, 00101, 95), B=(00018, 70, 13F, A=a500, 500Only=False), P=(12F, -665=101275-101940), G=(23.775406, 120.191780, H=0, V=0).
Line 472: 2021-03-11 14:38:47.3373|INFO| 有效: T=16ms, (01-000071, S=0), R=(#00002, 00102, 122), B=(00018, 70, 13F, A=a500, 500Only=False), P=(12F, -665=101275-101940), G=(23.775406, 120.191780, H=0, V=0).
Line 791: 2021-03-11 14:38:52.9511|INFO| 有效: T=6s, (01-000071, S=0), R=(#00001, 00101, 94), B=(00018, 74, 13F, A=a500, 500Only=False), P=(12F, -668=101272-101940), G=(23.775410, 120.191780, H=0, V=0).
Line 1114: 2021-03-11 14:38:58.5812|INFO| 有效: T=6s, (01-000071, S=0), R=(#00002, 00102, 90), B=(00018, 61, 13F, A=a500, 500Only=False), P=(12F, -677=101263-101940), G=(23.775415, 120.191780, H=0, V=0).
Line 1122: 2021-03-11 14:38:58.5969|INFO| 有效: T=-15ms, (01-000071, S=0), R=(#00001, 00101, 61), B=(00018, 61, 13F, A=a500, 500Only=False), P=(12F, -677=101263-101940), G=(23.775415, 120.191780, H=0, V=0).
Line 1138: 2021-03-11 14:38:59.5029|INFO| 有效: T=922ms, (01-000071, S=0), R=(#00003, 00103, 95), B=(00018, 61, 13F, A=a500, 500Only=False), P=(12F, -677=101263-101940), G=(23.775415, 120.191780, H=0, V=0).
Line 1471: 2021-03-11 14:39:04.1958|INFO| 有效: T=5s, (01-000071, S=0), R=(#00001, 00101, 66), B=(00018, 60, 13F, A=a500, 500Only=False), P=(13F, -688=101252-101940), G=(23.775410, 120.191796, H=0, V=0).
Line 1479: 2021-03-11 14:39:04.1958|INFO| 有效: T=17ms, (01-000071, S=0), R=(#00002, 00102, 84), B=(00018, 60, 13F, A=a500, 500Only=False), P=(13F, -688=101252-101940), G=(23.775410, 120.191796, H=0, V=0).
Line 1487: 2021-03-11 14:39:04.2170|INFO| 有效: T=4ms, (01-000071, S=0), R=(#00003, 00103, 93), B=(00018, 60, 13F, A=a500, 500Only=False), P=(13F, -688=101252-101940), G=(23.775410, 120.191796, H=0, V=0).
----------
20200602
ref:
Regular expressions quick reference.pdf
----------
20181109
每一個LINE用戶帳號都有一個專屬的內部識別碼,稱為User ID。
User ID與LINE用戶自訂的LINE ID的格式與用途完全不同。
開發Messaging API應用程式時,無論是接受訊息、傳送訊息、或是存取其他API,皆必須使用User ID來代表LINE用戶。
User ID的格式為33個字元的英數字字串,例如U206d25c2ea6bd87c17655609a1c37cb8。
如果開發者想要驗證一個字串是否為正確的User ID格式,可以使用正規表示式(Regular Expression)「^U[0-9a-f]{32}$」來測試。
----------
20181109
Regular Expressions
ref:
todo:
https://dotblogs.com.tw/johnny/archive/2010/01/25/13301.aspx
----------
The regular expression \p{P}*\s+ matches zero, one, or more punctuation characters followed by one or more white-space characters. It assumes that the total number of matches equals the approximate word count.
string pattern = @"\p{P}*\s+";
// Number of words.
int nWords = 0;
nWords = Regex.Matches(input, pattern).Count;
----------
20201223
以下為 ZRegularExpression.cs:
/*
ZRegularExpression.cs
20201122, Honda, Update for vs2019 v16.
Samples:
查詢依序存在的三個字串: 有效, 000057, 00013
20201211, Notepad++ 測試 OK.
\b有效\b.*\b000057\b.*\b00013\b
*/
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
// add
using System.Text.RegularExpressions;
namespace ZLib
{
public static class ZRegularExpression
{
///
/// 判斷輸入的字符串是否是一個合法的Email地址
///
///
///
public static bool ZRegIsEmail(string input)
{
string pattern = @"^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$";
Regex regex = new Regex(pattern);
return regex.IsMatch(input);
}
///
/// 判斷輸入的字符串字包含英文字母
///
///
///
public static bool ZRegIsEnglisCh(string input)
{
Regex regex = new Regex("^[A-Za-z]+$");
return regex.IsMatch(input);
}
///
/// 判斷輸入的字符串是否是表示一個IP地址
///
/// 被比較的字符串
/// 是IP地址則為True
public static bool ZIsIPv4(string input)
{
string[] IPs = input.Split('.');
Regex regex = new Regex(@"^\d+$");
for (int i = 0; i < IPs.Length; i++)
{
if (!regex.IsMatch(IPs[i]))
{
return false;
}
if (Convert.ToUInt16(IPs[i]) > 255)
{
return false;
}
}
return true;
}
///
/// 判斷輸入的字符串是否是合法的IPV6 地址
///
///
///
/* *******************************************************************
* 1、通過「:」來分割字符串看得到的字符串數組長度是否小於等於8
* 2、判斷輸入的IPV6字符串中是否有「::」。
* 3、如果沒有「::」採用 ^([\da-f]{1,4}:){7}[\da-f]{1,4}$ 來判斷
* 4、如果有「::」 ,判斷"::"是否止出現一次
* 5、如果出現一次以上 返回false
* 6、^([\da-f]{1,4}:){0,5}::([\da-f]{1,4}:){0,5}[\da-f]{1,4}$
* ******************************************************************/
public static bool ZIsIPV6(string input)
{
string pattern;
string temp = input;
string[] strs = temp.Split(':');
if (strs.Length > 8)
{
return false;
}
//int count = MetarnetRegex.GetStringCount(input, "::");
int count = 0;
if (count > 1)
{
return false;
}
else if (count == 0)
{
pattern = @"^([\da-f]{1,4}:){7}[\da-f]{1,4}$";
Regex regex = new Regex(pattern);
return regex.IsMatch(input);
}
else
{
pattern = @"^([\da-f]{1,4}:){0,5}::([\da-f]{1,4}:){0,5}[\da-f]{1,4}$";
Regex regex1 = new Regex(pattern);
return regex1.IsMatch(input);
}
}
///
/// 調用Regex中IsMatch函數實現一般的正則表達式匹配
///
/// 要搜索匹配項的字符串
/// 要匹配的正則表達式模式。
/// 如果正則表達式找到匹配項,則為 true;否則,為 false。
public static bool ZRegIsMatch(string input, string pattern)
{
Regex regex = new Regex(pattern);
return regex.IsMatch(input);
}
///
/// 判斷輸入的字符串是否是一個合法的手機號
///
///
///
public static bool ZRegIsMobilePhone(string input)
{
Regex regex = new Regex("^13\\d{9}$");
return regex.IsMatch(input);
}
///
/// 匹配非負整數
///
///
///
///
public static bool ZRegIsNotNagtive(string input)
{
Regex regex = new Regex(@"^\d+$");
return regex.IsMatch(input);
}
///
/// 判斷輸入的字符串是否只包含數字和英文字母
///
///
///
public static bool ZRegIsNumAndEnCh(string input)
{
string pattern = @"^[A-Za-z0-9]+$";
Regex regex = new Regex(pattern);
return regex.IsMatch(input);
}
///
/// 判斷輸入的字符串只包含數字
/// 可以匹配整數和浮點數
/// ^-?\d+$|^(-?\d+)(\.\d+)?$
///
///
///
public static bool ZRegIsNumber(string input)
{
string pattern = "^-?\\d+$|^(-?\\d+)(\\.\\d+)?$";
Regex regex = new Regex(pattern);
return regex.IsMatch(input);
}
///
/// RegularExpressions 助憶函數
/// 至少有一個數字
/// 至少有一個小寫英文字母
/// 至少有一個大寫英文字母
/// 字串長度在 6 ~ 30 個字母之間
///
///
///
public static bool ZRegIsPasswordUp6With1A(string s1)
{
Regex rx1 = new Regex(@"^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,30}$");
return rx1.IsMatch(s1);
}
///
/// RegularExpressions 助憶函數
/// 至少有一個數字
/// 至少有一個大寫或小寫英文字母
/// 至少有一個特殊符號
/// 字串長度在 6 ~ 30 個字母之間
///
///
///
public static bool ZRegIsPasswordUp6With1S(string s1)
{
Regex rx1 = new Regex(@"^(?=.*\d)(?=.*[a-zA-Z])(?=.*\W).{6,30}$");
return rx1.IsMatch(s1);
}
///
/// 匹配3位或4位區號的電話號碼,其中區號可以用小括號括起來,
/// 也可以不用,區號與本地號間可以用連字號或空格間隔,
/// 也可以沒有間隔
/// \(0\d{2}\)[- ]?\d{8}|0\d{2}[- ]?\d{8}|\(0\d{3}\)[- ]?\d{7}|0\d{3}[- ]?\d{7}
///
///
///
public static bool ZRegIsPhone(string input)
{
//string pattern = "^\\(0\\d{2}\\)[- ]?\\d{8}$|^0\\d{2}[- ]?\\d{8}$|^\\(0\\d{3}\\)[- ]?\\d{7}$|^0\\d{3}[- ]?\\d{7}$";
string pattern = "^\\(0\\d{2}\\)[- ]?\\d{8}$|^0\\d{2}[- ]?\\d{8}$|^\\(0\\d{3}\\)[- ]?\\d{6}$|^0\\d{3}[- ]?\\d{6}$";
Regex regex = new Regex(pattern);
return regex.IsMatch(input);
}
public static string ZRegIsTaiwanPhone(string sInput)
{
//try
//{
string sPattern = @"^\(?(\d{2})\)?[\s\-]?(\d{4})\-?(\d{4})$";
//string sPattern = "^\\(0\\d{2}\\)[- ]?\\d{8}$|^0\\d{2}[- ]?\\d{8}$|^\\(0\\d{3}\\)[- ]?\\d{6}$|^0\\d{3}[- ]?\\d{6}$";
Regex r1 = new Regex(sPattern);
string s1 = sInput;
Match m1 = r1.Match(s1);
string s2 = string.Format("({0}) {1}-{2}", m1.Groups[1], m1.Groups[2], m1.Groups[3]);
return s2;
//}
//catch (Exception ex)
//{
// mError.Set(ex);
// return string.Empty;
//}
}
///
/// RegularExpressions 助憶函數
/// xxxx-xxxxxx
///
///
///
public static bool ZRegIsPhone_10(string s1)
{
Regex rx1 = new Regex(@"\b\d{4}-\d{6}");
return rx1.IsMatch(s1);
}
///
/// RegularExpressions 助憶函數
/// xxxx-xxxx
///
///
///
public static bool ZRegIsPhone_8(string s1)
{
Regex rx1 = new Regex(@"\b\d\d\d\d-\d\d\d\d");
return rx1.IsMatch(s1);
}
///
/// 匹配正整數
///
///
///
public static bool ZRegIsUint(string input)
{
Regex regex = new Regex("^[0-9]*[1-9][0-9]*$");
return regex.IsMatch(input);
}
///
/// 判斷輸入的字符串是否是一個超鏈接
///
///
///
public static bool ZRegIsURL(string input)
{
//string pattern = @"http://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?";
string pattern = @"^[a-zA-Z]+://(\w+(-\w+)*)(\.(\w+(-\w+)*))*(\?\S*)?$";
Regex regex = new Regex(pattern);
return regex.IsMatch(input);
}
public static bool ZLike(string item, string searchPattern)
{
return ZREGGetRegex("^" + searchPattern).IsMatch(item);
}
///
/// 從輸入字符串中的第一個字符開始,用替換字符串替換指定的正則表達式模式的所有匹配項。
///
/// 輸入字符串
/// 模式字符串
/// 用於替換的字符串
/// 返回被替換後的結果
public static string ZRegReplace(string input, string pattern, string replacement)
{
Regex regex = new Regex(pattern);
return regex.Replace(input, replacement);
}
public static string ZRegSearch(string item, string searchPattern)
{
var match = ZREGGetRegex(searchPattern).Match(item);
if (match.Success)
{
return item.Substring(match.Index, match.Length);
}
return null;
}
///
/// 在由正則表達式模式定義的位置拆分輸入字符串。
///
/// 輸入字符串
/// 模式字符串
///
public static string[] ZRegSplit(string input, string pattern)
{
Regex regex = new Regex(pattern);
return regex.Split(input);
}
public static List ZRegExtract(string item, string searchPattern)
{
var result = ZRegSearch(item, searchPattern);
//if (!string.IsNullOrWhiteSpace(result)) .net 4.0才支援
if (!string.IsNullOrEmpty(result))
{
var splitted = searchPattern.Split(new[] { '?', '%', '*', '#' }, StringSplitOptions.RemoveEmptyEntries);
var temp = result;
var final = new List();
// .net 4.0才支援
//splitted.ForEach(x =>
//{
// var pos = temp.IndexOf(x);
// if (pos > 0)
// {
// final.Add(temp.Substring(0, pos));
// temp = temp.Substring(pos);
// }
// temp = temp.Substring(x.Length);
//});
foreach (string s1 in splitted)
{
var pos = temp.IndexOf(s1);
if (pos > 0)
{
final.Add(temp.Substring(0, pos));
temp = temp.Substring(pos);
}
temp = temp.Substring(s1.Length);
}
if (temp.Length > 0) final.Add(temp);
return final;
}
return null;
}
///
/// 計算字符串的字符長度,一個漢字字符將被計算為兩個字符
///
/// 需要計算的字符串
/// 返回字符串的長度
public static int ZRegGetCount(string input)
{
return Regex.Replace(input, @"[\u4e00-\u9fa5/g]", "aa").Length;
}
///
/// 判斷字符串compare 在 input字符串中出現的次數
///
/// 源字符串
/// 用於比較的字符串
/// 字符串compare 在 input字符串中出現的次數
private static int ZRegGetStringCount(string input, string compare)
{
int index = input.IndexOf(compare);
if (index != -1)
{
//return 1 + RegGetStringCount(input.Substring(index + compare.Length), compare);
//return 1 + input.Substring(index + compare.Length).ZRegGetStringCount(compare);
return 1 + ZRegGetStringCount(input.Substring(index + compare.Length), compare);
}
else
{
return 0;
}
}
///
/// 以字串取得Regex物件. 字串中的保留字會自動置換.
///
///
///
static Regex ZREGGetRegex(string searchPattern)
{
return new Regex(searchPattern
.Replace("\\", "\\\\")
.Replace(".", "\\.")
.Replace("{", "\\{")
.Replace("}", "\\}")
.Replace("[", "\\[")
.Replace("]", "\\]")
.Replace("+", "\\+")
.Replace("$", "\\$")
.Replace(" ", "\\s")
.Replace("#", "[0-9]")
.Replace("?", ".")
.Replace("*", "\\w*")
.Replace("%", ".*")
, RegexOptions.IgnoreCase);
}
public static string ZMatchValueAfter(string sAll, string sPattern)
{
// example return "13"
//StringBuilder sb1 = new StringBuilder();
//sb1.Append("GET /chat HTTP/1.1\r\n");
//sb1.Append("Host: localhost:80\r\n");
//sb1.Append("Upgrade: websocket\r\n");
//sb1.Append("Connection: Upgrade\r\n");
//sb1.Append("Origin: http://localhost:80\r\n");
//sb1.Append("Sec-WebSocket-Key: 930vdInchBqkasHhQh6aIQ==\r\n");
//sb1.Append("Sec-WebSocket-Version: 13\r\n");
//sb1.Append("\r\n");
//sAll = sb1.ToString();
//sPattern = "Sec-WebSocket-Key: (.*)";
Regex Regex1 = new Regex(sPattern, RegexOptions.IgnoreCase);
GroupCollection c1 = Regex1.Match(sAll).Groups;
//if (c1.Count < 1)
// return null;
return c1[1].Value.Trim();
}
public static string ZGetMatchValueExample()
{
// example:
StringBuilder sb1 = new StringBuilder();
sb1.Append("GET /chat HTTP/1.1\r\n");
sb1.Append("Host: localhost:80\r\n");
sb1.Append("Upgrade: websocket\r\n");
sb1.Append("Connection: Upgrade\r\n");
sb1.Append("Origin: http://localhost:80\r\n");
sb1.Append("Sec-WebSocket-Key: 930vdInchBqkasHhQh6aIQ==\r\n");
sb1.Append("Sec-WebSocket-Version: 13\r\n");
sb1.Append("\r\n");
string sHead = sb1.ToString();
//Regex webSocketKeyRegex = new Regex("Sec-WebSocket-Key: (.*)");
Regex webSocketVersionRegex = new Regex("Sec-WebSocket-Version: (.*)");
// check the version. Support version 13 and above
const int WebSocketVersion = 13;
int secWebSocketVersion = Convert.ToInt32(webSocketVersionRegex.Match(sHead).Groups[1].Value.Trim());
if (secWebSocketVersion < WebSocketVersion)
{
throw new Exception(string.Format("WebSocket Version {0} not suported. Must be {1} or above", secWebSocketVersion, WebSocketVersion));
}
//string secWebSocketKey = webSocketKeyRegex.Match(sHead).Groups[1].Value.Trim();
//string setWebSocketAccept = base.ComputeSocketAcceptString(secWebSocketKey);
//string response = ("HTTP/1.1 101 Switching Protocols\r\n"
// + "Connection: Upgrade\r\n"
// + "Upgrade: websocket\r\n"
// + "Sec-WebSocket-Accept: " + setWebSocketAccept);
// Sec-WebSocket-Key: 930vdInchBqkasHhQh6aIQ==
//string regexPattern = "Sec-WebSocket-Accept: (.*)";
//sPattern = regexPattern;
//Regex regex1 = new Regex(regexPattern);
// Group[0].value = Sec-WebSocket-Accept
// Group[1].value = (.*) for all matched.
//string sMatch = regex1.Match(sAll).Groups[1].Value.Trim();
// 不分大小寫
string header = "Upgrade: websocket";
Regex webSocketUpgradeRegex = new Regex("Upgrade: websocket", RegexOptions.IgnoreCase);
Match webSocketUpgradeRegexMatch = webSocketUpgradeRegex.Match(header);
if (webSocketUpgradeRegexMatch.Success)
{
return "Success";
}
else
{
return "Fail";
}
//return sMatch;
}
}
}