Skip to content

TheGodOfObelisk/Knowledge-Graph-Analyze

Repository files navigation

知识图谱+网络安全

Knowledge-Graph-Analyze

尝试1:

Bro和Snort的初步结果存入知识图谱,如网络包、网络底层事件等。知识图谱在这些数据的基础上进行分析。
问题:
具体是做什么样的分析呢,分析出什么结果?
老师提及的“多步攻击”?
知识图谱中的数据的存储格式是不是要作出改变,以适应“分析”的要求?
关于网络底层事件:
网络基本事件,Bro会生成很多日志文件,其中大多数是以协议的名称命名的(其内容基本是与该协议相关的流量内容)。但是也有比较特殊的日志文件,比如notice.log,我们可以定制该文件的内容(通过添加notice类型的方式),姑且认为notice.log文件中记录的内容就是所谓的网络基本事件。
conn.log中存放网络中连接的日志,其实连接建立也是一种事件,是不是被Bro整理为日志输出的内容,都属于事件的范畴?
关于网络包:
网络包应该是网络流量最原始的状态,没有经过上层分析。Snort在Packet Logger模式下,记录的就是网络数据包。
关于知识图谱的分析、推理功能:
参考《网络空间安全防御与态势感知》的第8章,要对网络中的事件坐初步的分析、推理需要一个”本体模型“,这里提及了OWL模型。所以,我们的数据是不是也需要经过一番处理,转换成OWL模型的数据,方便分析、推理呢?
关于知识图谱的存储:
我们目前将知识存储在MYSQL数据库中,这种传统的关系型数据的存储与知识图谱所需的语义存储相去甚远。考虑使用D2RQ将关系型数据转换为RDF表示的数据。

数据集选取

考虑DARPA的LLS_DDOS,这是一个DDOS攻击的数据集,它将攻击分为五个阶段[1]:
(1) 预探测网络(IPSweep);
IPsweep of the AFB from a remote site
The adversary performs a scripted IPsweep of multiple class C subnets on the Air Force Base. The following networks are swept from address 1 to 254: 172.16.115.0/24, 172.16.114.0/24, 172.16.113.0/24, 172.16.112.0/24. The attacker sends ICMP echo-requests in this sweep and listens for ICMP echo-replies to determine which hosts are "up".
(2) 端口扫描,确定主机的脆弱信息(PortScan);
Probe of live IP's to look for the sadmind daemon running on Solaris hosts
The hosts discovered in the previous phase are probed to determine which hosts are running the "sadmind" remote administration tool. This tells the attacker which hosts might be vulnerable to the exploit that he/she has. Each host is probed, by the script, using the "ping" option of the sadmind exploit program, as provided on the Internet by "Cheez Whiz". The ping option makes a rpc request to the host in question, asks what TCP port number to connect to for the sadmind service, and then connects to the port number supplied to test to see if the daemon is listening.
(3) 获得管理员权限(FTPBufOverflow);
Breakins via the sadmind vulnerability, both successful and unsuccessful on those hosts
The attacker then tries to break into the hosts found to be running the sadmind service in the previous phase. The attack script attempts the sadmind Remote-to-Root exploit several times against each host, each time with different parameters. Since this is a remote buffer-overflow attack, the exploit code cannot easily determine the appropriate stack pointer value as in a local buffer-overflow. Thus the adversary must try several different stack pointer values, each of which he/she has validated to work on some test machines. There are three stack pointer values attempted on each potential victim. With each attempt, the exploit tries to execute one command, as root, on the remote system. The attacker needs to execute two commands however, one to "cat" an entry onto the victim's /etc/passwd file and one to "cat" an entry onto the victim's /etc/shadow file. The new root user's name is 'hacker2' and hacker2's home directory is set to be /tmp. Thus, there are 6 exploit attempts on each potential victim host. To test weather or not a break-in was successful, the attack script attempts a login, via telnet, as hacker2, after each set of two breakin attempts. When successful the attackers script moves on to the next potential victim.
(4) 安装特洛伊Mstream DDOS木马软件(UploadSoftware);
Installation of the trojan mstream DDoS software on three hosts at the AFB
Entering this phase, the attack script has built a list of those hosts on which it has successfully installed the 'hacker2' user. These are mill (172.16.115.20), pascal (172.16.112.50), and locke (172.16.112.10). For each host on this list, the script performs a telnet login, makes a directory on the victim called "/tmp/.mstream/" and uses rcp to copy the server-sol binary into the new directory. This is the mstream server software. The attacker also installs a ".rhosts" file for themselves in /tmp, so that they can rsh in to startup the binary programs. On the first victim on the list, the attacker also installs the "master-sol" software, which is the mstream master. After installing the software on each host, the attacker uses rsh to startup first the master, and then the servers. as they come up, each server "registers" with the master that it is alive. The master writes out a database of live servers to a file called "/tmp/.sr".
(5) 借助被控制的主机对远程服务器发动DDOS攻击(DDOSAttack);
Launching the DDoS
In the final phase, the attacker manually launches the DDOS. This is performed via a telnet login to the victim on which the master is running, and then, from the victim, a "telnet" to port 6723 of the localhost. Port 6723/TCP is the port on which the master listens for connections to its user-interface. After entering a password for the user-interface, the attacker is given a prompt at which he/she enters two commands. The command "servers" causes the UI to list the mstream servers which have registered with it and are ready to attack. the command "mstream 131.84.1.31 5" causes a DDOS attack, of 5 second duration, against the given IP address to be launched by all three servers simultaneously. The mstream DDOS consists of many, many connection requests to a variety of ports on the victim. All packets have a spoofed, random source IP address. The attacker then logs out. The tiny duration was chosen so that it would be possible to easily distribute tcpdump and audit logs of these events -- to avoid them being to large. In real life, one might expect a DDOS of longer duration, several hours or more.
In the case of this scenario, however, it should be noted that the DDoS does not exactly succeed. The Mstream DDoS software attempts to flood the victim with ack packets that go to many random tcp ports on the victim host. The AirForce base firewall, the Sidewinder firewall, is not configured to pass traffic on all these ports, thus the only mstream packets that make it though the firewall are those on well-known ports. All other mstream packets result in a tcp reset being sent to the spoof source address. Thus in the DMZ dump file, one sees many resets apparently coming from "www.af.mil" going to the many spoofed source addresses. These are actually created by the firewall as a result of the reciept of the tcp packet for which the firewall is configured not to proxy!

[1] 胡倩.基于多步攻击场景的攻击预测方法[J].计算机科学,2019,46(S1):365-369.

方法探索、文献阅读

文献1[2]
基于关联分析和HMM的网络安全态势评估模型
态势要素提取、态势理解和态势评估,是一个将基本的关于网络信息系统与网络安全方法的静动态信息通过信息融合技术逐步加工生成网络管理员可以理解和进行决策的信息的过程。(XX是XXX的过程)
此过程是在态势理解环节通过关联分析实现对告警信息的聚类,并通过隐马尔可夫模型实现对态势的评估和预测。(关联分析=>聚类? HMM理解一下?)
态势要素:资产信息、网络攻击告警信息、资产的漏洞信息。
在态势理解环节,要对态势要素提取的各类型信息作面向主机的告警聚类和面向攻击模式的关联分析。
态势评估环节,该文章采用基于HMM的态势评估方法,将攻击的威胁等级作为观测值,将态势作为隐含的需要评估的状态值。
思考尝试:分为三步,第一步,从流量出发,通过知识图谱的分析,给出告警信息;第二步,结合文献1,对告警信息作关联分析;第三步,在关联分析的基础上作态势评估。要注意,三步都要与KG相结合,体现出“基于KG”这一思想的价值。
[2] 吴建台,刘光杰,刘伟伟,戴跃伟.一种基于关联分析和HMM的网络安全态势评估方法[J].计算机与现代化,2018(06):30-36.

实践探索

采用百度开发的hugegraph开源图数据库,Hugegraph支持gremline,属性图表示,提供图计算API以及展示图数据的hugegraph studio.
图谱内容主要分两部分,其一,网络安全知识(包括特征事件图);其二,由真实网络流量产生的动态图(分析对象).
思路:

  1. 图谱构建工作,首先,网络安全知识部分转移过来(从原先的mysql数据库,需重新考虑点属性和关系);其次,关注bro脚本和自带日志,完善动态图;最后构建特征事件图(要结合后面的图分析算法考虑).
  2. 图分析算法,在动态图中发现攻击事件(匹配攻击事件),这是第一个难点.考虑这几点,第一,实际场景中,分析对象是动态变化的,动态图匹配算法复杂度高;第二,这里的图匹配不是严格的匹配,而是模式匹配,另外,不仅仅要考虑拓扑,还要考虑点/边的属性(图的内容).这一步完成,即完成任务:基于行为发现攻击事件(尚未将攻击事件串起来).
  3. 攻击事件关联分析方法,仅发现攻击事件是不够的,需要找到这些攻击事件之间的关联,以进一步发现多步攻击.这里考虑HMM方法,还需要更多工作.可预见的难点,如何将HMM方法与KG扯上关系,这一点很重要.

各个文件的功能

addEdge.py: 好像没用,当时是用来测试“添加边”的功能的。
attack_pattern.py:把自定义的攻击特征(在attack_pattern_event.log中)导入“特征图谱”。
generate_graph.py:生成网络事件图。
initial_properties.py:定义图谱中需要的点、边属性,还有标签。
initial.groovy:没用。
put_method.py:没用。
signature_test.bro:没用。
subgraph_search.py:完成攻击事件查找,尚没有弄关联(设置的攻击事件太少了!)。
updateHost.bro:bro旧版本,没用。
updateHost.zeek:生成产生图谱所需的日志文件。

About

利用知识图谱分析网络安全事件

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published