从0开始搭建日志分析系统（二）

从0开始搭建日志分析系统（二）：日志分析

闲话

前一篇文章我们已经收集到日志了，并且每天有定时任务把当天的日志切割压缩，接下来是对每天的日志进行分析
最开始的思路是使用nodejs提供的fs模块来按行读取文件，然后分析过滤每一行，最后对结果以html方式输出，
但是在实际操作中发现fs读取文件太慢（200m左右的大概需要一个半小时，如果整个过程一切顺利还好，就怕跑
到90%了挂了，又得重新来，白白浪费时间），在逐步的改进过程中，发现linux自带很多日志操作命令能跨世纪
般的提高效率，使用linux命令能把200m左右的日志分析时间减少到1分钟之内，在此不得不感叹linux的强大。
参考：
http://blog.csdn.net/teamlet/article/details/38046409/
https://www.cnblogs.com/yougewe/p/5173635.html
主要使用命令：
cat grep sed sort uniq

一、部署项目https://github.com/17it/logfactory.git

clone项目到任意目录
思路：
1.使用cat命令，按照指定规则先过滤日志
1
cat out.log | grep "xx" > xx-first.log
2.sed命令替换掉每行日志前面的时间（此处时间格式可能不一致）
1
sed 's/^....\-..\-.. ..\:..\: //g' xx-first.log > xx-second.log
3.sort命令进行排序
1
sort xx-second.log > xx-tmp.log
4.uniq -c去重，并统计相同日志行的数量
1
uniq -c xx-tmp.log > xx-uniq.log
5.再次使用sort命令排序去重后的日志（按照数量降序 -nr）
1
sort -nr xx-uniq.log > xx-first.log
6.sed命令替换每行头尾空格，替换成js里数组的形式，方便后续使用
1
sed -e 's/ *$/\",/' xx-first.log > xx-second.log && sed -e 's/^ */\"/' xx-second.log > xx.log
7.移除过程中生成的-*.log临时文件
1
rm -f xx-*.log"

核心代码：

var filename = path.join(__dirname, '../src/') + self.name;
var exec = require('child_process').exec;
var cmdstr = "cat " + path.join(__dirname, "../out_log.log") + " | grep " + self.name + ":http > " + filename + "-first.log && sed 's/^....\-..\-.. ..\:..\: //g' " + filename + "-first.log > " + filename + "-second.log && sort " + filename + "-second.log > " + filename + "-tmp.log && uniq -c " + filename + "-tmp.log > " + filename + "-uniq.log && sort -nr " + filename + "-uniq.log > " + filename + "-first.log && sed -e 's/ *$/\",/' " + filename + "-first.log > " + filename + "-second.log && sed -e 's/^  */\"/' " + filename + "-second.log > " + filename + ".log && rm -f " + filename + "-*.log";
exec(cmdstr, function(err,stdout,stderr){
    if(err) {
        console.log('cat out_log.log error: '+ stderr);
        return;
    }
    console.log(stdout);
    if(files.length > 0) {
        transFile.init(files[0]);
    }else {
        self.second();
    }
});

可能需要修改代码里的目录或者筛选名，然后直接执行node main/start
接下来静静的等待静态html的生成吧,因为每次会生成新的html文件，所以这里需要配置一个静态文件目录（/root/study），以便访问

server {
    listen       80;
    #listen       [::]:80 default_server;
    server_name  17it.ren;
    autoindex on;
    root         /root/study;

    # Load configuration files for the default server block.
    #include /etc/nginx/default.d/*.conf;

    location / {
      root /root/study;
      index index.html;
      autoindex on;
    }

    error_page 404 /404.html;
        location = /40x.html {
    }

    error_page 500 502 503 504 /50x.html;
        location = /50x.html {
    }
}