使用HBaseshell进行扫描

有谁知道如何扫描logging基于一些扫描filter,即:

column:something = "somevalue"

像这样的东西,但从HBase壳?

5 Solutions collect form web for “使用HBaseshell进行扫描”

尝试这个。 这有点丑,但对我很有用。

 import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 't1', { COLUMNS => 'family:qualifier', FILTER => SingleColumnValueFilter.new (Bytes.toBytes('family'), Bytes.toBytes('qualifier'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('somevalue')) } 

HBase shell将包含〜/ .irbrc中的任何内容,所以你可以在这里放置这样的东西(我不是Ruby专家,欢迎进行改进):

 # imports like above def scan_substr(table,family,qualifier,substr,*cols) scan table, { COLUMNS => cols, FILTER => SingleColumnValueFilter.new (Bytes.toBytes(family), Bytes.toBytes(qualifier), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new(substr)) } end 

然后你可以在shell里说:

 scan_substr 't1', 'family', 'qualifier', 'somevalue', 'family:qualifier' 
 scan 'test', {COLUMNS => ['F'],FILTER => \ "(SingleColumnValueFilter('F','u',=,'regexstring:http:.*pdf',true,true)) AND \ (SingleColumnValueFilter('F','s',=,'binary:2',true,true))"} 

更多信息可以在这里find。 请注意,多个示例驻留在附加的Filter Language.docx文件中。

使用scan的FILTER参数,如使用帮助中所示:

 hbase(main):002:0> scan ERROR: wrong number of arguments (0 for 1) Here is some help for this command: Scan a table; pass table name and optionally a dictionary of scanner specifications. Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, or COLUMNS. If no columns are specified, all columns will be scanned. To scan all members of a column family, leave the qualifier empty as in 'col_family:'. Some examples: hbase> scan '.META.' hbase> scan '.META.', {COLUMNS => 'info:regioninfo'} hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]} For experts, there is an additional option -- CACHE_BLOCKS -- which switches block caching for the scanner on (true) or off (false). By default it is enabled. Examples: hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false} 
 Scan scan = new Scan(); FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL); //in case you have multiple SingleColumnValueFilters, you would want the row to pass MUST_PASS_ALL conditions or MUST_PASS_ONE condition. SingleColumnValueFilter filter_by_name = new SingleColumnValueFilter( Bytes.toBytes("SOME COLUMN FAMILY" ), Bytes.toBytes("SOME COLUMN NAME"), CompareOp.EQUAL, Bytes.toBytes("SOME VALUE")); filter_by_name.setFilterIfMissing(true); //if you don't want the rows that have the column missing. Remember that adding the column filter doesn't mean that the rows that don't have the column will not be put into the result set. They will be, if you don't include this statement. list.addFilter(filter_by_name); scan.setFilter(list); 

其中一个filter是Valuefilter ,可用于过滤所有列值。

hbase(main):067:0> scan 'dummytable', {FILTER => "ValueFilter(=,'binary:2016-01-26')"}

二进制是filter中使用的比较器之一。 你可以根据你想要做的事情在filter中使用不同的比较器。

您可以参考以下url:http:// http://www.hadooptpoint.com/filters-in-hbase-shell/。 它提供了有关如何在HBase Shell中使用不同filter的好例子。

  • Mongo中的外键?
  • DynamoDB相对于其他NoSQL数据库有哪些优缺点?
  • 什么时候不应该使用关系数据库?
  • 有人试图neo4j与泰坦 - 利弊
  • 下一代数据库
  • Hive:在主表上执行增量更新的最佳方式
  • 从SQL服务器迁移到MongoDB的原因
  • 有没有使用NoSQL数据库的电子商务网站?
  • 您使用NoSQL数据存储时遇到了哪些可伸缩性问题?
  • Windows Azure PaaS(networkingangular色)的真正替代品?
  • 什么是deviseCassandra数据模型的最佳实践?