【Leetcode】UTF-8 Validation

A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

For 1-byte character, the first bit is a 0, followed by its unicode code.

For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.

class Solution(object):

def validUtf8(self, data):

"""

:type data: List[int]

:rtype: bool

"""

count = 0

for d in data:

if 128<=d<192:

if count==0:

return False

count -= 1

else:

if count:

return False //如果后面值需要接一个10开头的，但是却有两个10开头的，也就是count<0了，这时候也是返回错误的

if d<128:

continue

elif 192<=d<224:

count = 1

elif 224<=d<240:

count = 2

elif 240<=d<248:

count = 3

else:

return False

return count==0

1 UTF8: character encoding method using 1 to 4 bytes to encode all unicode

2 If using 1 byte, the first bit should be 0

3 If using n bytes, the first n bits should be all ones, and the n+1 bit should be 0, and the most significant 2 bits for all n-1 bytes should be 10

4 分别判断数是在哪个范围，分别遍历data的数，如果在：

<128 说明是ASCII码，直接跳过；

128<=x<192 以10开头，如果count！=0，则count=count-1；如果count=0，则返回false

192<=x<224 以110开头，说明后面必须得跟1个10开头的数才正确，所以count=1

224<=x<240 以1110开头，说明后面必须得跟2个10开头的数才正确，所以count=2

240<=x<248 以11110开头，说明后面必须得跟3个10开头的数才正确，所以count=2

其他情况返回false

最后判断count是否等于0，如果等于0，说明以10开头的个数是正确的

5 为什么先要判断是不是10开头？因为要特别注意第一个数是不是10开头的，如果是10开头，则应该返回False，所以初始化count=0，如果第一个就是10开头，表示此时count=0，返回False，如果不是10开头，则可以跑后面的分支

6 int("11001101", 2)可以将二进制转换成十进制

7 第一个数以10开头是不对的