Git责备：统计

我怎么能“滥用”责备（或更好的适合function，和/或与shell命令结合）给我统计当前在每个提交者源代码库中的代码行数？

输出示例：

Committer 1: 8046 Lines Committer 2: 4378 Lines

更新

 git ls-tree -r -z --name-only HEAD -- */*.c | xargs -0 -n1 git blame \ --line-porcelain HEAD |grep "^author "|sort|uniq -c|sort -nr

我在路上更新了一些东西。

对于懒惰，你也可以把它放在它自己的命令：

 #!/bin/bash # save as ie: git-authors and set the executable flag git ls-tree -r -z --name-only HEAD -- $1 | xargs -0 -n1 git blame \ --line-porcelain HEAD |grep "^author "|sort|uniq -c|sort -nr

将其存储在您的path中，或修改您的path并像使用它

git authors '*/*.c' # look for all files recursively ending in .c
git authors '*/*.[ch]' # look for all files recursively ending in .c or .h
git authors 'Makefile' # just count lines of authors in the Makefile

原始答复

虽然接受的答案是做这个工作，但是非常缓慢。

 $ git ls-tree --name-only -z -r HEAD|egrep -z -Z -E '\.(cc|h|cpp|hpp|c|txt)$' \ |xargs -0 -n1 git blame --line-porcelain|grep "^author "|sort|uniq -c|sort -nr

几乎是瞬间的。

要获取当前跟踪的文件列表，您可以使用

 git ls-tree --name-only -r HEAD

这个解决scheme避免了调用file来确定文件types，并且出于性能的原因使用grep来匹配想要的扩展名。如果所有文件都应该包含在内，只要从行中删除。

 grep -E '\.(cc|h|cpp|hpp|c)$' # for C/C++ files grep -E '\.py$' # for Python files

如果文件可以包含空格，这些空格可以用于shell：

 git ls-tree -z --name-only -r HEAD | egrep -Z -z '\.py'|xargs -0 ... # passes newlines as '\0'

给出一个文件列表（通过pipe道），可以使用xargs来调用命令并分发参数。允许处理多个文件的命令提示-n1 。在这种情况下，我们称之为git blame --line-porcelain并且每次调用我们只用1个参数。

 xargs -n1 git blame --line-porcelain

然后，我们过滤输出为“作者”的出现sorting列表和计数重复行：

 grep "^author "|sort|uniq -c|sort -nr

注意

其他答案实际上过滤掉只包含空格的行。

 grep -Pzo "author [^\n]*\n([^\n]*\n){10}[\w]*[^\w]"|grep "author "

上面的命令将打印包含至less一个非空白字符的行的作者。您还可以使用match \w*[^\w#] ，这也将排除第一个非空白字符不是# （许多脚本语言中的注释）的行。

我写了一个名为git-fame的gem可能是有用的。

安装和使用：

$ gem install git_fame
$ cd /path/to/gitdir
$ git fame

输出：

 Statistics based on master Active files: 21 Active lines: 967 Total commits: 109 Note: Files matching MIME type image, binary has been ignored +----------------+-----+---------+-------+---------------------+ | name | loc | commits | files | distribution (%) | +----------------+-----+---------+-------+---------------------+ | Linus Oleander | 914 | 106 | 21 | 94.5 / 97.2 / 100.0 | | f1yegor | 47 | 2 | 7 | 4.9 / 1.8 / 33.3 | | David Selassie | 6 | 1 | 2 | 0.6 / 0.9 / 9.5 | +----------------+-----+---------+-------+---------------------+

 git ls-tree -r HEAD|sed -re 's/^.{53}//'|while read filename; do file "$filename"; done|grep -E ': .*text'|sed -r -e 's/: .*//'|while read filename; do git blame -w "$filename"; done|sed -r -e 's/.*\((.*)[0-9]{4}-[0-9]{2}-[0-9]{2} .*/\1/' -e 's/ +$//'|sort|uniq -c

一步一步的解释：

列出版本控制下的所有文件

 git ls-tree -r HEAD|sed -re 's/^.{53}//'

修剪列表只有文本文件

 |while read filename; do file "$filename"; done|grep -E ': .*text'|sed -r -e 's/: .*//'

Git责怪所有的文本文件，忽略空白的变化

 |while read filename; do git blame -w "$filename"; done

拉出作者的名字

 |sed -r -e 's/.*\((.*)[0-9]{4}-[0-9]{2}-[0-9]{2} .*/\1/' -e 's/ +$//'

对作者列表进行sorting，并统一连续重复行数

 |sort|uniq -c

示例输出：

  1334 Maneater 1924 Another guy 37195 Brian Ruby 1482 Anna Lambda

 git summary --line

从git-extras软件包正是你所需要的。

在git-extras – git-summary上查看文档

输出结果如下所示：

 project : TestProject lines : 13397 authors : 8927 John Doe 66.6% 4447 Jane Smith 33.2% 23 Not Committed Yet 0.2%

Erik的解决scheme非常棒，但是我用了一些变音符号（尽pipe我的LC_*环境variables表面上是正确设置的）有一些问题，并且在实际上有date的代码行中泄漏了噪声。我的sed-fu很差，所以我用这个frankenstein代码片断做了一个ruby，但是它在200,000+ LOC上完美的工作，并且对结果进行sorting：

 git ls-tree -r HEAD | gsed -re 's/^.{53}//' | \ while read filename; do file "$filename"; done | \ grep -E ': .*text' | gsed -r -e 's/: .*//' | \ while read filename; do git blame "$filename"; done | \ ruby -ne 'puts $1.strip if $_ =~ /^\w{8} \((.*?)\s*\d{4}-\d{2}-\d{2}/' | \ sort | uniq -c | sort -rg

另外请注意，而不是sed因为这是二进制的自制软件安装，使系统sed完好无损。

git shortlog -sn

这将显示每个作者的提交列表。

查看http://gitstats.sourceforge.net/上提供的gitstats命令;

这是@Alex的答案的主要片段，实际上是聚合怪罪行的操作。我已经把它切割成一个文件而不是一组文件。

 git blame --line-porcelain path/to/file.txt | grep "^author " | sort | uniq -c | sort -nr

我在这里发布这个，因为我经常回到这个答案，重新阅读这个post，并重新消化这些例子，以提取我认为它正在征税的部分。对我的用例来说也不够通用。它的范围是整个C项目。

我喜欢列出每个文件的统计信息，通过使用迭代器而不是xargs的bash来xargs因为我发现xargs的可读性差，难以使用/记忆。xargs vs for的优缺点应该在别处讨论。

这是一个实用的代码片段，它将分别显示每个文件的结果：

 for file in $(git ls-files); do \ echo $file; \ git blame --line-porcelain $file \ | grep "^author " | sort | uniq -c | sort -nr; \ echo; \ done

而我testing，在bash shell中运行这个stright是ctrl + c safe，如果你需要把它放到一个bash脚本中，你可能需要在SIGINT和SIGTERM上陷阱，如果你想让用户能够打破你的for循环。

我有这个解决scheme来计算所有文本文件（不包括二进制文件，甚至是版本化文件）中的责备行：

 IFS=$'\n' for file in $(git ls-files); do git blame `git symbolic-ref --short HEAD` --line-porcelain "$file" | \ grep "^author " | \ grep -v "Binary file (standard input) matches" | \ grep -v "Not Committed Yet" | \ cut -d " " -f 2- done | \ sort | \ uniq -c | \ sort -nr

制作了我自己的脚本，它是@nilbus和@Alex的组合

 #!/bin/sh for f in $(git ls-tree -r --name-only HEAD --); do j=$(file "$f" | grep -E ': .*text'| sed -r -e 's/: .*//'); if [ "$f" != "$j" ]; then continue; fi git blame -w --line-porcelain HEAD "$f" | grep "^author " | sed 's/author //'`enter code here` done | sort | uniq -c | sort -nr

Git责备：统计

更新

原始答复

注意

如何从本地分支“拉”到另一个分支？

在Git中查找当前签出的提交

在一个特定的提交更改提交作者

使用具有非标准端口的远程存储库

尝试推送到远程分支时，git错误

在github中重命名一个分支

推送到Git返回错误代码403致命：HTTP请求失败

使用gitpipe理更新日志的好方法？

如何删除使用git创build的存储创build？

存储@ {1}是不明确的？