将来的自己,会感谢现在努力的自己!

0%

iOS底层探索系列--类的本质

在讲这边文章之前,假设我们都已经掌握了c语言指针知识。并且已经编译好了苹果开源的objc4-756。
关于一些lldb的指令,请移步Xcode调试LLDB
补充两点:

  • p/x 以16进制打印当前地址
  • x/4xg 以16进制读取当前对象的首地址向后4位内存地址

一、查看cpp源码

首先我们创建一个工程,声明一个FLYPerson类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#import <Foundation/Foundation.h>

NS_ASSUME_NONNULL_BEGIN

@interface FLYPerson : NSObject{
NSString *hobby;
}

@property (nonatomic, copy) NSString *nickName;

- (void)sayHello;
- (void)sayByeBye;
- (void)sayGoGo;
+ (void)sayHappy;

@end

NS_ASSUME_NONNULL_END
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#import "FLYPerson.h"

@implementation FLYPerson

- (void)sayHello {

NSLog(@"FLYPerson say : Hello!!!");
}

- (void)sayByeBye {

NSLog(@"FLYPerson say : ByeBye!!!");
}

- (void)sayGoGo {

NSLog(@"FLYPerson say : GoGo!!!");
}

+ (void)sayHappy {

NSLog(@"FLYPerson say : Happy!!!");
}

@end

在main函数中我们创建一个对象

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#import <Foundation/Foundation.h>
#import <objc/runtime.h>
#import "FLYPerson.h"

int main(int argc, const char * argv[]) {
@autoreleasepool {
// insert code here...
NSLog(@"Hello, World!");
FLYPerson * person = [[FLYPerson alloc] init];
Class pClass = object_getClass(person);
[person sayHello];
[person sayByeBye];
[person sayGoGo];

NSLog(@"%@ - %p", person, pClass);
}
return 0;
}

利用clang编译成cpp源码(先cd到当前文件的目录):

  • clang -rewrite-objc main.m -o main.cpp
  • 存在UIKit等其他动态引用库时:
    clang -rewrite-objc -fobjc-arc -fobjc-runtime=ios-13.0.0 -isysroot/Application/Xcode.app/Comtents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator13.0.sdk main.m
  • xcrun xcode 命令
    模拟器:xcrun -sdk iphonesimulator clang -arch arm64 -rewrite-objc main.m -o main-arm64.cpp
    真机:xcrun -sdk iphoneos clang -arch arm64 -rewrite-objc main.m -o main-arm64.cpp

二、在生成的.cpp源码中查看FLYPerson类

既然我们探究的是类,那就是Class,我们在cpp文件中可以看到:

1
typedef struct objc_class * Class;

这说明,Class其实就是objc_class结构体的指针
继续查找,可以发现objc_clss的声明:

1
2
3
struct objc_class {
Class _Nonnull isa __attribute__((deprecated));
} __attribute__((unavailable));

已经注释已经废弃,没有办法了么?还记得我们已经准备好了756的源码,去源码中搜索objc_class,会发现几点重要信息:

上图中可以证实,class其实就是objc_class

三、分析objc_class源码

1、先来了解一下主要的数据结构

看一下objc_class数据结构:
objc-runtime-new.h中struct objc_class : objc_object

1
2
3
4
5
6
7
8
struct objc_class : objc_object {
// Class ISA;
Class superclass;
cache_t cache; // formerly cache pointer and vtable
class_data_bits_t bits; // class_rw_t * plus custom rr/alloc flags

...//此处省略了结构体内的函数方法,以上是全部属性
}

看一下内部属性的各自数据结构,这里的// Class ISA; ISA被隐藏了,这是因为objc_class继承自objc_object

1
2
3
struct objc_object {
Class _Nonnull isa OBJC_ISA_AVAILABILITY;
};

注意:
这里isa用Class类型,应该是与oc的多态类似,isa和class的结构相同,或者isa就是按照class的结构来设计的。

cache_t:(对sel和imp做缓存,这里有一个3/4缓存机制)

1
2
3
4
5
6
struct cache_t {
struct bucket_t *_buckets; // 指针占用8字节
mask_t _mask; // uint32_t类型 32 / 8 = 4字节
mask_t _occupied; // uint32_t类型 32 / 8 = 4字节
... //省略所有函数方法
}

class_data_bits_t:(这里只有一个属性bits,其中具体的值要转换成data查看,即class_rw_t* data() )

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

struct class_data_bits_t {

// Values are the FAST_ flags above.
uintptr_t bits;
private:
...
public:
class_rw_t* data() {
return (class_rw_t *)(bits & FAST_DATA_MASK);
}
void setData(class_rw_t *newData)
{
assert(!data() || (newData->flags & (RW_REALIZING | RW_FUTURE)));
// Set during realization or construction only. No locking needed.
// Use a store-release fence because there may be concurrent
// readers of data and data's contents.
uintptr_t newBits = (bits & ~FAST_DATA_MASK) | (uintptr_t)newData;
atomic_thread_fence(memory_order_release);
bits = newBits;
}
...//以下函数省略
}

class_rw_t的数据结构:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
struct class_rw_t {
// Be warned that Symbolication knows the layout of this structure.
uint32_t flags;
uint32_t version;

const class_ro_t *ro;

method_array_t methods;
property_array_t properties;
protocol_array_t protocols;

Class firstSubclass;
Class nextSiblingClass;

char *demangledName;

#if SUPPORT_INDEXED_ISA
uint32_t index;
#endif

void setFlags(uint32_t set)
{
OSAtomicOr32Barrier(set, &flags);
}

void clearFlags(uint32_t clear)
{
OSAtomicXor32Barrier(clear, &flags);
}

// set and clear must not overlap
void changeFlags(uint32_t set, uint32_t clear)
{
assert((set & clear) == 0);

uint32_t oldf, newf;
do {
oldf = flags;
newf = (oldf | set) & ~clear;
} while (!OSAtomicCompareAndSwap32Barrier(oldf, newf, (volatile int32_t *)&flags));
}
};

class_ro_t的数据结构:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
struct class_ro_t {
uint32_t flags;
uint32_t instanceStart;
uint32_t instanceSize;
#ifdef __LP64__
uint32_t reserved;
#endif

const uint8_t * ivarLayout;

const char * name;
method_list_t * baseMethodList;
protocol_list_t * baseProtocols;
const ivar_list_t * ivars;

const uint8_t * weakIvarLayout;
property_list_t *baseProperties;

method_list_t *baseMethods() const {
return baseMethodList;
}
};

2、计算objc_class中各属性占用的字节长度

提取出objc_class中的属性:
Class ISA; // 8字节
Class superclass; // 8字节
cache_t cache; // 经过计算后为16字节
class_data_bits_t bits; // class_rw_t * plus custom rr/alloc flags

3、获取类中成员变量、属性和method

此时回到我们main函数:
利用lldb指令打印FLYperson对象指针地址:

1
2
3
(lldb) x/4gx pClass
0x1000021f8: 0x001d8001000021d1 0x0000000100b37140
0x100002208: 0x00000001003da290 0x0000000000000000

0x1000021f8 偏移8个字节,刚好到superclass的首地址,打印出了superclass

1
2
(lldb) po 0x100002200
<NSObject: 0x100002200>

利用内存地址偏移,0x100002208 较 0x1000021f8 刚好偏移16位,刚好指向objc_class中的cache的首地址

1
2
(lldb) po 0x100002208
4294976008

打印cache的首地址,发现是一串数字,因为cache是结构体,内部有多个值,进行了内存对齐,所以打印出来的是多个值的组合,先过滤,最后进行摸索,由于cache是16位,从cache首地址偏移16位,就能到达bits首地址

1
2
3
4
5
6
(lldb) po 0x100002218
objc[14430]: Attempt to use unknown class 0x101e09dd0.
4294976024

(lldb) p 0x100002218
(long) $5 = 4294976024

两种方式均报错了,尝试强转(因为已经不是oc中对象类型,以下都是用p来打印)

1
2
(lldb) p (class_data_bits_t *)0x100002218
(class_data_bits_t *) $7 = 0x0000000100002218

此时用到class_data_bits_t中的方法(此方法是通过掩码的方式,将数据获取出来,FAST_DATA_MASK是掩码)

1
2
3
class_rw_t* data() {
return (class_rw_t *)(bits & FAST_DATA_MASK);
}

继续获取:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
(lldb) p $7->data()
(class_rw_t *) $9 = 0x0000000101e09dd0

//取$9的值
(lldb) p *$9
(class_rw_t) $10 = {
flags = 2148139008
version = 0
ro = 0x0000000100002170
methods = {
list_array_tt<method_t, method_list_t> = {
= {
list = 0x00000001000020a8
arrayAndFlag = 4294975656
}
}
}
properties = {
list_array_tt<property_t, property_list_t> = {
= {
list = 0x0000000100002158
arrayAndFlag = 4294975832
}
}
}
protocols = {
list_array_tt<unsigned long, protocol_list_t> = {
= {
list = 0x0000000000000000
arrayAndFlag = 0
}
}
}
firstSubclass = nil
nextSiblingClass = NSUUID
demangledName = 0x0000000000000000
}

继续

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(lldb) p $10.ro
(const class_ro_t *) $11 = 0x0000000100002170
(lldb) p *$11
(const class_ro_t) $12 = {
flags = 388
instanceStart = 8
instanceSize = 24
reserved = 0
ivarLayout = 0x0000000100000f45 "\x02"
name = 0x0000000100000f3b "FLYPerson"
baseMethodList = 0x00000001000020a8
baseProtocols = 0x0000000000000000
ivars = 0x0000000100002110
weakIvarLayout = 0x0000000000000000
baseProperties = 0x0000000100002158
}

属性:

1
2
3
4
5
6
7
8
9
10
(lldb) p $12.baseProperties
(property_list_t *const) $13 = 0x0000000100002158
(lldb) p *$13
(property_list_t) $14 = {
entsize_list_tt<property_t, property_list_t, 0> = {
entsizeAndFlags = 16
count = 1
first = (name = "nickName", attributes = "T@\"NSString\",C,N,V_nickName")
}
}

成员变量:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(lldb) p $12.ivars
(const ivar_list_t *const) $15 = 0x0000000100002110
(lldb) p *$15
(const ivar_list_t) $16 = {
entsize_list_tt<ivar_t, ivar_list_t, 0> = {
entsizeAndFlags = 32
count = 2
first = {
offset = 0x00000001000021c0
name = 0x0000000100000f7d "hobby"
type = 0x0000000100000fa8 "@\"NSString\""
alignment_raw = 3
size = 8
}
}
}

方法列表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
(lldb) p $12.baseMethodList
(method_list_t *const) $17 = 0x00000001000020a8
(lldb) p *$17
(method_list_t) $18 = {
entsize_list_tt<method_t, method_list_t, 3> = {
entsizeAndFlags = 26
count = 4
first = {
name = "sayHello"
types = 0x0000000100000f8d "v16@0:8"
imp = 0x0000000100000c90 (FLYTest`-[FLYPerson sayHello] at FLYPerson.m:12)
}
}
}

count = 4 说明有4个方法,让我们一一打印(get是结构体中的函数,可去结构体中查看)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(lldb) p $18.get(0)
(method_t) $19 = {
name = "sayHello"
types = 0x0000000100000f8d "v16@0:8"
imp = 0x0000000100000c90 (FLYTest`-[FLYPerson sayHello] at FLYPerson.m:12)
}
(lldb) p $18.get(1)
(method_t) $20 = {
name = ".cxx_destruct"
types = 0x0000000100000f8d "v16@0:8"
imp = 0x0000000100000d60 (FLYTest`-[FLYPerson .cxx_destruct] at FLYPerson.m:10)
}
(lldb) p $18.get(2)
(method_t) $21 = {
name = "setNickName:"
types = 0x0000000100000f9d "v24@0:8@16"
imp = 0x0000000100000d20 (FLYTest`-[FLYPerson setNickName:] at FLYPerson.h:16)
}
(lldb) p $18.get(3)
(method_t) $22 = {
name = "nickName"
types = 0x0000000100000f95 "@16@0:8"
imp = 0x0000000100000cf0 (FLYTest`-[FLYPerson nickName] at FLYPerson.h:16)
}

以上并没有类方法,因为类方法是存放在元类中的,让我们来获取该类的ISA,即该类的元类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
(lldb) x/4xg pClass
0x1000021f8: 0x001d8001000021d1 0x0000000100b37140
0x100002208: 0x00000001003da290 0x0000000000000000

(lldb) po 0x001d8001000021d1 & 0x00007ffffffffff8
FLYPerson

(lldb) p/x 0x001d8001000021d1 & 0x00007ffffffffff8
(long) $3 = 0x00000001000021d0

(lldb) x/4gx 0x00000001000021d0
0x1000021d0: 0x001d800100b370f1 0x0000000100b370f0
0x1000021e0: 0x000000010112bcb0 0x0000000100000003

(lldb) p (class_data_bits_t *)0x1000021f0
(class_data_bits_t *) $5 = 0x00000001000021f0

(lldb) p $5->data()
(class_rw_t *) $7 = 0x0000000101111ac0

(lldb) p $7->ro
(const class_ro_t *) $8 = 0x0000000100002060

(lldb) p *$8
(const class_ro_t) $9 = {
flags = 389
instanceStart = 40
instanceSize = 40
reserved = 0
ivarLayout = 0x0000000000000000
name = 0x0000000100000f3b "FLYPerson"
baseMethodList = 0x0000000100002040
baseProtocols = 0x0000000000000000
ivars = 0x0000000000000000
weakIvarLayout = 0x0000000000000000
baseProperties = 0x0000000000000000
}

(lldb) p $9.baseMethodList
(method_list_t *const) $11 = 0x0000000100002040

(lldb) p *$11
(method_list_t) $12 = {
entsize_list_tt<method_t, method_list_t, 3> = {
entsizeAndFlags = 26
count = 1
first = {
name = "sayHappy"
types = 0x0000000100000f8d "v16@0:8"
imp = 0x0000000100000cc0 (FLYTest`+[FLYPerson sayHappy] at FLYPerson.m:16)
}
}
}

经过一顿操作之后,又回到了上述的地方(至于0x00007ffffffffff8怎么来的,会在另一篇文章中指出)

小结:
在获取成员变量、属性和方法的时候,objc_class -> bits -> class_rw_t * data() -> class_ro_t * ro -> 获取对应的属性。
此处需要注意的是:在class_rw_t中

  • method_array_t methods;
  • property_array_t properties;
  • protocol_array_t protocols;

还不清楚这三个属性的作用,有时间补充

4、探索cache的原理

看一下cache_t完整的结构体

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
struct cache_t {
struct bucket_t *_buckets; //实际存储区
mask_t _mask; // 当前容量
mask_t _occupied; // 当前占据

public:
struct bucket_t *buckets();
mask_t mask();
mask_t occupied();
void incrementOccupied();
void setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask);
void initializeToEmpty();

mask_t capacity();
bool isConstantEmptyCache();
bool canBeFreed();

static size_t bytesForCapacity(uint32_t cap);
static struct bucket_t * endMarker(struct bucket_t *b, uint32_t cap);

void expand();
void reallocate(mask_t oldCapacity, mask_t newCapacity);
struct bucket_t * find(cache_key_t key, id receiver);

static void bad_cache(id receiver, SEL sel, Class isa) __attribute__((noreturn));
};

bucket_t的结构:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
struct bucket_t {
private:
// IMP-first is better for arm64e ptrauth and no worse for arm64.
// SEL-first is better for armv7* and i386 and x86_64.
#if __arm64__
MethodCacheIMP _imp;
cache_key_t _key;
#else
cache_key_t _key;
MethodCacheIMP _imp;
#endif

public:
inline cache_key_t key() const { return _key; }
inline IMP imp() const { return (IMP)_imp; }
inline void setKey(cache_key_t newKey) { _key = newKey; }
inline void setImp(IMP newImp) { _imp = newImp; }

void set(cache_key_t newKey, IMP newImp);
};

可见,sel和imp都是存在bucket_t中。
跑一下项目,将断点打在[person sayGoGo],利用lldb查看内存

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
(lldb) x person
0x102100020: 4d 22 00 00 01 80 1d 00 00 00 00 00 00 00 00 00 M"..............
0x102100030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
(lldb) p (cache_t *)0x102100030
(cache_t *) $1 = 0x0000000102100030
(lldb) p *$1
(cache_t) $2 = {
_buckets = 0x0000000000000000
_mask = 0
_occupied = 0
}
(lldb) x pClass
0x100002248: 21 22 00 00 01 80 1d 00 40 71 b3 00 01 00 00 00 !"......@q......
0x100002258: 80 01 10 02 01 00 00 00 03 00 00 00 03 00 00 00 ................
(lldb) p (cache_t *)0x100002258
(cache_t *) $4 = 0x0000000100002258
(lldb) p *$4
(cache_t) $5 = {
_buckets = 0x0000000102100180
_mask = 3
_occupied = 3
}
(lldb) p $5._buckets
(bucket_t *) $6 = 0x0000000102100180
(lldb) p *$6
(bucket_t) $7 = {
_key = 4294971196
_imp = 0x0000000100000ba0 (FLYTest`-[FLYPerson sayHello] at FLYPerson.m:12)
}

可以看到,我们先获取person的cache_t,发现内部的值都是空值,又来获取其Class的cache_t,发现是有值的,得出结论,对象的方法列表缓存是存在其类对象中(其实就是对象的ISA中,因为对象的元类就是其Class)。
既然我们知道了方法是缓存在class中,那我们直接探索class中方法的缓存策略。

经过诸多尝试,我们发现,对象在调用方法的时候,执行的步骤大概如下:
cache_t大概流程
重点来了,看一下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
static void cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver)
{
cacheUpdateLock.assertLocked();

// Never cache before +initialize is done
if (!cls->isInitialized()) return;

// Make sure the entry wasn't added to the cache by some other thread
// before we grabbed the cacheUpdateLock.
if (cache_getImp(cls, sel)) return;

cache_t *cache = getCache(cls);
cache_key_t key = getKey(sel);

// Use the cache as-is if it is less than 3/4 full
mask_t newOccupied = cache->occupied() + 1;
mask_t capacity = cache->capacity();
if (cache->isConstantEmptyCache()) {
// Cache is read-only. Replace it.
cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE);
}
else if (newOccupied <= capacity / 4 * 3) {
// Cache is less than 3/4 full. Use it as-is.
}
else {
// Cache is too full. Expand it.
cache->expand();
}

// Scan for the first unused slot and insert there.
// There is guaranteed to be an empty slot because the
// minimum size is 4 and we resized at 3/4 full.
bucket_t *bucket = cache->find(key, receiver);
if (bucket->key() == 0) cache->incrementOccupied();
bucket->set(key, imp);
}
1
2
3
4
5
6
7
8
9
10
11
cache_t *getCache(Class cls) 
{
assert(cls);
return &cls->cache;
}

cache_key_t getKey(SEL sel)
{
assert(sel);
return (cache_key_t)sel;
}

以上两个方法没什么好说的,一个获取cache,一个获取sel的key,这里是把(char类型)sel转成了unsigned long,因为数字比char容易处理,且速度快。
cache->capacity() 方法:

1
2
3
4
mask_t cache_t::capacity() 
{
return mask() ? mask()+1 : 0;
}

capacity 获取新的容量,在原基础上+1。

cache->reallocate方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity)
{
bool freeOld = canBeFreed();

bucket_t *oldBuckets = buckets();
bucket_t *newBuckets = allocateBuckets(newCapacity);

// Cache's old contents are not propagated.
// This is thought to save cache memory at the cost of extra cache fills.
// fixme re-measure this

assert(newCapacity > 0);
assert((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);

setBucketsAndMask(newBuckets, newCapacity - 1);

if (freeOld) {
cache_collect_free(oldBuckets, oldCapacity);
cache_collect(false);
}
}

reallocate方法,开辟新空间,会把原来的内存都释放掉,也就是会把原来缓存的内容全部清除掉。

cache->expand方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void cache_t::expand()
{
cacheUpdateLock.assertLocked();

uint32_t oldCapacity = capacity();
uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE;

if ((uint32_t)(mask_t)newCapacity != newCapacity) {
// mask overflow - can't grow further
// fixme this wastes one bit of mask
newCapacity = oldCapacity;
}

reallocate(oldCapacity, newCapacity);
}

expand是扩容方法,从代码中可以得知,如果原来的容量是0,则创建为4的新容量,如果不是0,则扩展为原来的两倍。扩容的时候会在最后一位插入key为1,值根据不同设备存不同的值。容量设置为4-1或者两倍-1。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
bucket_t * cache_t::find(cache_key_t k, id receiver)
{
assert(k != 0);

bucket_t *b = buckets();
mask_t m = mask();
mask_t begin = cache_hash(k, m);
mask_t i = begin;
do {
if (b[i].key() == 0 || b[i].key() == k) {
return &b[i];
}
} while ((i = cache_next(i, m)) != begin);

// hack
Class cls = (Class)((uintptr_t)this - offsetof(objc_class, cache));
cache_t::bad_cache(receiver, (SEL)k, cls);
}

find方法找到对应当前sel的bucket,如果找不到获取一个空的bucket,如果没有空的则报错。这里其实有一个算法,所以每次遍历并不是从0开始的,所以每次缓存方法的时候的位置不是依次存储的,而是根据该算法有关。

小结:
知道了每个方法的作用,不难看出cache的机制,进来之后先判断cache是否为空,如果为空,创建空间为4的容量的缓存区。如果非空,且当前占据小于等于总容量的3/4,直接进行缓存,如果大于总容量3/4则进行扩容,扩容过程中会把之前的缓存清掉,然后将当前要缓存的进行缓存。

四、最后

以上只是一个读取类的数据结构的思路,这里主要是经过了内存地址读取,内存偏移,分析结构体等等,重要的是思路和过程,当然结果也重要,毕竟可以吹一波了!