Skip to content
princehaku edited this page Oct 6, 2013 · 6 revisions

How To Use?

怎么使用pyrailgun?

首先你需要创建一个对应站点的规则文件

比如testsite.json

{
    "action": "main",
    "name": "vc cartoon",
    "subaction": [
        {
            "action": "fetcher",
            "url": "http://www.verycd.com/base/cartoon/page${0,1}${0,1}",
            "subaction": [
                {
                    "action": "parser",
                    "subaction": [
                        {
                            "action": "shell",
                            "subaction": [
                                {
                                    "action": "parser",
                                    "setField": "img",
                                    "rule": ".entry_cover .cover_img"
                                },
                                {
                                    "attr": "href",
                                    "action": "parser",
                                    "subaction": [
                                        {
                                            "action": "fetcher",
                                            "url": "http://www.verycd.com${#src}",
                                            "subaction": [
                                                {
                                                    "action": "parser",
                                                    "setField": "description",
                                                    "rule": "#contents_more",
                                                    "strip": "true"
                                                }
                                            ]
                                        }
                                    ],
                                    "setField": "src",
                                    "pos": 0,
                                    "rule": "a"
                                },
                                {
                                    "action": "parser",
                                    "setField": "score",
                                    "rule": ".entry_cover .score",
                                    "strip": "true"
                                },
                                {
                                    "action": "parser",
                                    "setField": "dest",
                                    "rule": ".bio a"
                                }
                            ],
                            "group": "default"
                        }
                    ],
                    "rule": ".entry_cover_list li"
                }
            ]
        }
    ]
}

然后在代码里面把它作为一个任务加入到railgun

from railgun import RailGun

railgun = RailGun()
railgun.setTask(file("testsite.json"));
railgun.fire();
nodes = railgun.getShells('default')
print nodes

然后你就可以得到一个包含了所有解析后数据的节点列表了!

[

{img:xxx,src:xxx,score:xxx,dest:xxx,description:xxx},

{img:xxx,src:xxx,score:xxx,dest:xxx,description:xxx}

]