yara引擎调研

yara引擎调研

规则通用性问题

使用CS4.0生成如下的payload,其中powershell脚本10个,pe后门12个,dll文件2个,共计24个样本:

且这些样本的md5值均不一样,也就是所采用原始的拉黑md5方法是不可行的:

使用开源的CS yara规则(GitHub - chronicle/GCTI)对其进行扫描,可以看到所有的样本均检出了,且有一个样本被重复检出两次(beacon_server64.exe),最终的检出率为:100%=24/24:

静态的文本类特征在这种非混淆对抗的情况下通用性还是很高的,且是可以摆脱拉黑md5这种策略的局限性的。

如何内嵌到相应的产品引擎中

基于yara官方的文档,可以有两种内嵌的方法:

  1. 直接通过命令行调用编译好的yara引擎;
  2. 使用yara C api来调用yara相关的api来进行病毒扫描;

这两种方法在调用上是一样的,因为官方给的编译的yara引擎也是基于yara C api实现的:

这里使用的样本案例和yara样例:

test.txt内容:

1
111

test.yar内容:

1
2
3
4
5
6
7
8
rule test
{
strings:
$s1 = "111"

condition:
any of ($s*)
}

命令行调用编译好的yara引擎进行病毒扫描

即将编译好的yara引擎(windows下yara.exe)直接内嵌到产品中,在进行病毒扫描时通过命令行调用,demo如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include <iostream>
#include <stdlib.h>
#include <string>
using namespace std;

#define RULES_FILE "D:\\workstation\\yara_demo\\yara_demo\\test.yar"
#define VIRUS_FILE "D:\\workstation\\yara_demo\\yara_demo\\test.txt"

int main() {
string rule_file = RULES_FILE;
string virus_file = VIRUS_FILE;
string cmdline = "yara " + rule_file + " " + virus_file;
system(cmdline.c_str()); //执行 yara 扫描
return 0;
}

扫描结果如下:

使用yara C api来实现病毒扫描

将yara.h、libyara64.lib导入到执行目录和库目录,将libyara64.lib、ws2_32.lib、crypt32.lib添加到链接器的附加依赖项。

基于官方文档实现的一个简单的Demo:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
#include <iostream>
#include <windows.h>
#include <yara.h>

#define RULES_FILE "D:\\workstation\\yara_demo\\yara_demo\\test.yar"
#define VIRUS_FILE "D:\\workstation\\yara_demo\\yara_demo\\test.txt"

int my_callback_function(YR_SCAN_CONTEXT* context, int message, void* message_data, void* user_data)
{
if (message == CALLBACK_MSG_RULE_MATCHING) {
std::cout << "Matched!" << std::endl;
return CALLBACK_ABORT;
}

if (message == CALLBACK_MSG_SCAN_FINISHED) {
std::cout << "Not Matched!" << std::endl;
return CALLBACK_ABORT;
}

return CALLBACK_CONTINUE;
}

int main(int argc, char** argv)
{
// 打开rule文件
HANDLE hFile = CreateFileA(RULES_FILE,
GENERIC_READ,
0,
NULL,
OPEN_EXISTING,
0,
NULL);
if (hFile == INVALID_HANDLE_VALUE) {
std::cerr << "Failed to open rule file for reading" << std::endl;
return EXIT_FAILURE;
}

//打开样本文件
HANDLE vFile = CreateFileA(VIRUS_FILE,
GENERIC_READ,
0,
NULL,
OPEN_EXISTING,
0,
NULL);
if (vFile == INVALID_HANDLE_VALUE) {
std::cerr << "Failed to open virus file for reading" << std::endl;
return EXIT_FAILURE;
}

// 初始化Yara引擎
int result = yr_initialize();
if (result != ERROR_SUCCESS) {
std::cerr << "Failed to initialize Yara engine" << std::endl;
CloseHandle(hFile);
return EXIT_FAILURE;
}

// 创建编译器对象
YR_COMPILER* compiler;
result = yr_compiler_create(&compiler);
if (result != ERROR_SUCCESS) {
std::cerr << "Failed to create Yara compiler" << std::endl;
CloseHandle(hFile);
return EXIT_FAILURE;
}

//载入规则
result = yr_compiler_add_fd(compiler, hFile, NULL, NULL);
if (result != ERROR_SUCCESS) {
std::cerr << "Failed to add file contents to Yara compiler" << "\nError Code:" << result << std::endl;
CloseHandle(hFile);
return EXIT_FAILURE;
}

// 编译规则
YR_RULES* rules;
result = yr_compiler_get_rules(compiler, &rules);
if (result != ERROR_SUCCESS) {
std::cerr << "Failed to compile Yara rules" << std::endl;
CloseHandle(hFile);
return EXIT_FAILURE;
}

// 执行病毒扫描
result = yr_rules_scan_fd(rules,vFile,SCAN_FLAGS_FAST_MODE,my_callback_function,NULL,1000);
if (result != ERROR_SUCCESS) {
std::cerr << "Failed to scan memory with Yara rules" << std::endl;
CloseHandle(hFile);
return EXIT_FAILURE;
}

// 完成后释放内存
yr_rules_destroy(rules);
yr_compiler_destroy(compiler);
yr_finalize();
CloseHandle(hFile);
CloseHandle(vFile);

return EXIT_SUCCESS;
}

病毒扫描后的动作均在于病毒扫描函数的回调函数中。

扫描结果如下:

两种方法的对比

  1. 性能对比

两种方法根本上是没区别的,但是不断地启动一个新的进程和销毁进程的开销对效率还是有影响的。

  1. 效率对比

将样本beaconx64.dll(大小:281k)复制1000次,生成1024个样本,使用yara进行扫描五次文件夹,运行时间:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
┌─[root@cars]─[/home/cars/workstation/cobaltstrike4.0-cracked/shellcode]
└──╼ #time yara -r GCTI/YARA/CobaltStrike/* payloads2|grep pattern
warning: rule "CobaltStrike_Resources_Artifact32svc_Exe_v1_49_to_v3_14" in GCTI/YARA/CobaltStrike/CobaltStrike__Resources_Artifact32svc_Exe_v1_49_to_v4_x.yara(47): string "$decoderFunc" may slow down scanning

real 0m0.653s
user 0m0.983s
sys 0m0.499s
┌─[✗]─[root@cars]─[/home/cars/workstation/cobaltstrike4.0-cracked/shellcode]
└──╼ #time yara -r GCTI/YARA/CobaltStrike/* payloads2|grep pattern
warning: rule "CobaltStrike_Resources_Artifact32svc_Exe_v1_49_to_v3_14" in GCTI/YARA/CobaltStrike/CobaltStrike__Resources_Artifact32svc_Exe_v1_49_to_v4_x.yara(47): string "$decoderFunc" may slow down scanning

real 0m0.603s
user 0m1.008s
sys 0m0.436s
┌─[✗]─[root@cars]─[/home/cars/workstation/cobaltstrike4.0-cracked/shellcode]
└──╼ #time yara -r GCTI/YARA/CobaltStrike/* payloads2|grep pattern
warning: rule "CobaltStrike_Resources_Artifact32svc_Exe_v1_49_to_v3_14" in GCTI/YARA/CobaltStrike/CobaltStrike__Resources_Artifact32svc_Exe_v1_49_to_v4_x.yara(47): string "$decoderFunc" may slow down scanning

real 0m0.648s
user 0m1.040s
sys 0m0.456s
┌─[✗]─[root@cars]─[/home/cars/workstation/cobaltstrike4.0-cracked/shellcode]
└──╼ #time yara -r GCTI/YARA/CobaltStrike/* payloads2|grep pattern
warning: rule "CobaltStrike_Resources_Artifact32svc_Exe_v1_49_to_v3_14" in GCTI/YARA/CobaltStrike/CobaltStrike__Resources_Artifact32svc_Exe_v1_49_to_v4_x.yara(47): string "$decoderFunc" may slow down scanning

real 0m0.620s
user 0m1.066s
sys 0m0.416s
┌─[✗]─[root@cars]─[/home/cars/workstation/cobaltstrike4.0-cracked/shellcode]
└──╼ #time yara -r GCTI/YARA/CobaltStrike/* payloads2|grep pattern
warning: rule "CobaltStrike_Resources_Artifact32svc_Exe_v1_49_to_v3_14" in GCTI/YARA/CobaltStrike/CobaltStrike__Resources_Artifact32svc_Exe_v1_49_to_v4_x.yara(47): string "$decoderFunc" may slow down scanning

real 0m0.594s
user 0m1.008s
sys 0m0.446s

平均时间为user时间的平均:1.02s。

使用yara C api实现的批量读取规则并批量扫描的Demo:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
#include <iostream>
#include <windows.h>
#include <yara.h>

#define RULES_DIR "D:\\workstation\\yara_demo\\yara_demo\\rules"
#define SCAN_DIR "D:\\workstation\\yara_demo\\yara_demo\\scan"

int my_callback_function(YR_SCAN_CONTEXT* context, int message, void* message_data, void* user_data)
{
if (message == CALLBACK_MSG_RULE_MATCHING) {
std::cout << "Matched!" << std::endl;
return CALLBACK_ABORT;
}

if (message == CALLBACK_MSG_SCAN_FINISHED) {
LARGE_INTEGER liFinishTime; // 定义结束时间
QueryPerformanceCounter(&liFinishTime); // 获取结束时间
LARGE_INTEGER liFrequency; // 计时器频率
QueryPerformanceFrequency(&liFrequency);
double elapsedTime = static_cast<double>(liFinishTime.QuadPart - *reinterpret_cast<LONGLONG*>(user_data)) / liFrequency.QuadPart; // 计算扫描时间
std::cout << "Not Matched! Elapsed Time: " << elapsedTime << " s" << std::endl; // 输出扫描时间
return CALLBACK_ABORT;
}

return CALLBACK_CONTINUE;
}

int main(int argc, char** argv)
{
// 打开rule目录
WIN32_FIND_DATAA find_data;
HANDLE hFind = FindFirstFileA((std::string(RULES_DIR) + "\\*").c_str(), &find_data);
if (hFind == INVALID_HANDLE_VALUE) {
std::cerr << "Failed to open rule directory" << std::endl;
return EXIT_FAILURE;
}

// 初始化Yara引擎
int result = yr_initialize();
if (result != ERROR_SUCCESS) {
std::cerr << "Failed to initialize Yara engine" << std::endl;
FindClose(hFind);
return EXIT_FAILURE;
}

// 创建编译器对象
YR_COMPILER* compiler;
result = yr_compiler_create(&compiler);
if (result != ERROR_SUCCESS) {
std::cerr << "Failed to create Yara compiler" << std::endl;
FindClose(hFind);
return EXIT_FAILURE;
}

// 遍历rule目录,读取所有规则文件
do {
if (!(find_data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
std::string rule_file = std::string(RULES_DIR) + "\\" + find_data.cFileName;
HANDLE hFile = CreateFileA(rule_file.c_str(),
GENERIC_READ,
0,
NULL,
OPEN_EXISTING,
0,
NULL);
if (hFile == INVALID_HANDLE_VALUE) {
std::cerr << "Failed to open rule file for reading: " << rule_file << std::endl;
continue;
}
result = yr_compiler_add_fd(compiler, hFile, NULL, NULL);
if (result != ERROR_SUCCESS) {
std::cerr << "Failed to add file contents to Yara compiler: " << rule_file << "\nError Code:" << result << std::endl;
CloseHandle(hFile);
continue;
}
}
} while (FindNextFileA(hFind, &find_data));

FindClose(hFind);

// 编译规则
YR_RULES* rules;
result = yr_compiler_get_rules(compiler, &rules);
if (result != ERROR_SUCCESS) {
std::cerr << "Failed to compile Yara rules" << std::endl;
yr_compiler_destroy(compiler);
return EXIT_FAILURE;
}

// 扫描scan目录下的所有文件
hFind = FindFirstFileA((std::string(SCAN_DIR) + "\\*").c_str(), &find_data);
if (hFind == INVALID_HANDLE_VALUE) {
std::cerr << "Failed to open scan directory" << std::endl;
yr_rules_destroy(rules);
yr_compiler_destroy(compiler);
yr_finalize();
return EXIT_FAILURE;
}

LARGE_INTEGER liStartTime; // 定义开始时间
QueryPerformanceCounter(&liStartTime); // 获取开始时间
do {
if (!(find_data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
std::string scan_file = std::string(SCAN_DIR) + "\\" + find_data.cFileName;
result = yr_rules_scan_file(rules, scan_file.c_str(), SCAN_FLAGS_FAST_MODE, my_callback_function, &liStartTime.QuadPart, 1000); // 将开始时间指针传递给回调函数
if (result != ERROR_SUCCESS) {
std::cerr << "Failed to scan file with Yara rules: " << scan_file << std::endl;
continue;
}
}
} while (FindNextFileA(hFind, &find_data));

FindClose(hFind);

// 完成后释放内存
yr_rules_destroy(rules);
yr_compiler_destroy(compiler);
yr_finalize();

return EXIT_SUCCESS;
}

这里没有用到多线程等提高效率的方式,所以总的时间比较长,相同数量规则和相同数量的文件情况下,总的扫描时间为3s左右:

可以看到,在不使用效率优化的策略情况下,使用yara C api的效率还不如直接调用官方给的yara引擎。