T-SQLselect查询以删除非数字字符

我在可变alpha长度的列中弄脏了数据。 我只是想剥离任何不是0-9的东西。

我不想运行一个函数或过程。 我有一个类似的脚本,只是在文本之后抓取数值,它看起来像这样:

Update TableName set ColumntoUpdate=cast(replace(Columnofdirtydata,'Alpha #','') as int) where Columnofdirtydata like 'Alpha #%' And ColumntoUpdate is Null 

我认为这将工作得很好,直到我发现我认为的一些数据字段只是格式Alpha#12345789不是…

需要剥离的数据的例子

 AB ABCDE # 123 ABCDE# 123 AB: ABC# 123 

我只想要123.所有的数据字段确实都有#号码之前的#号。

我尝试了子string和PatIndex,但我没有得到正确的语法或东西。 任何人有任何build议来解决这个问题的最佳方法?

谢谢!

看到这个博客文章从SQL Server中的string中提取数字。 下面是一个在你的例子中使用string的示例:

 DECLARE @textval NVARCHAR(30) SET @textval = 'AB ABCDE # 123' SELECT LEFT(SUBSTRING(@textval, PATINDEX('%[0-9.-]%', @textval), 8000), PATINDEX('%[^0-9.-]%', SUBSTRING(@textval, PATINDEX('%[0-9.-]%', @textval), 8000) + 'X') -1) 

你可以使用东西和patindex 。

 stuff(Col, 1, patindex('%[0-9]%', Col)-1, '') 

SQL小提琴

如果在数字之间可能存在一些字符(例如数千个分隔符),则可以尝试以下方法:

 declare @table table (DirtyCol varchar(100)) insert into @table values ('AB ABCDE # 123') ,('ABCDE# 123') ,('AB: ABC# 123') ,('AB#') ,('AB # 1 000 000') ,('AB # 1`234`567') ,('AB # (9)(876)(543)') ;with tally as (select top (100) N=row_number() over (order by @@spid) from sys.all_columns), data as ( select DirtyCol, Col from @table cross apply ( select (select C + '' from (select N, substring(DirtyCol, N, 1) C from tally where N<=datalength(DirtyCol)) [1] where C between '0' and '9' order by N for xml path('')) ) p (Col) where p.Col is not NULL ) select DirtyCol, cast(Col as int) IntCol from data 

输出是:

 DirtyCol IntCol --------------------- ------- AB ABCDE # 123 123 ABCDE# 123 123 AB: ABC# 123 123 AB # 1 000 000 1000000 AB # 1`234`567 1234567 AB # (9)(876)(543) 9876543 

要更新,请添加ColToUpdate以selectdata ColToUpdate列表:

 ;with num as (...), data as ( select ColToUpdate, /*DirtyCol, */Col from ... ) update data set ColToUpdate = cast(Col as int) 

这适用于我:

 CREATE FUNCTION [dbo].[StripNonNumerics] ( @Temp varchar(255) ) RETURNS varchar(255) AS Begin Declare @KeepValues as varchar(50) Set @KeepValues = '%[^0-9]%' While PatIndex(@KeepValues, @Temp) > 0 Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '') Return @Temp End 

然后像这样调用函数来查看被清理的东西旁边的原始东西:

 SELECT Something, dbo.StripNonNumerics(Something) FROM TableA 

如果你的服务器支持TRANSLATEfunction(在sql server上可用,在sql server 2017+上也是sql azure),这是一个很好的解决scheme。

首先,它用@字符replace任何非数字字符。 然后,它删除所有的@字符。 您可能需要添加您知道可能出现在TRANSLATE调用的第二个参数中的其他字符。

 select REPLACE(TRANSLATE([Col], 'abcdefghijklmnopqrstuvwxyz+()- ,#+', '@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@'), '@', '') 
  Declare @MainTable table(id int identity(1,1),TextField varchar(100)) INSERT INTO @MainTable (TextField) VALUES ('6B32E') declare @i int=1 Declare @originalWord varchar(100)='' WHile @i<=(Select count(*) from @MainTable) BEGIN Select @originalWord=TextField from @MainTable where id=@i Declare @r varchar(max) ='', @len int ,@c char(1), @x int = 0 Select @len = len(@originalWord) declare @pn varchar(100)=@originalWord while @x <= @len begin Select @c = SUBSTRING(@pn,@x,1) if(@c!='') BEGIN if ISNUMERIC(@c) = 0 and @c <> '-' BEGIN Select @r = cast(@r as varchar) + cast(replace((SELECT ASCII(@c)-64),'-','') as varchar) end ELSE BEGIN Select @r = @r + @c END END Select @x = @x +1 END Select @r Set @i=@i+1 END 

为了补充肯的答案,这将处理逗号,空格和括号

 --Handles parentheses, commas, spaces, hyphens.. declare @table table (c varchar(256)) insert into @table values ('This is a test 111-222-3344'), ('Some Sample Text (111)-222-3344'), ('Hello there 111222 3344 / How are you?'), ('Hello there 111 222 3344 ? How are you?'), ('Hello there 111 222 3344. How are you?') select replace(LEFT(SUBSTRING(replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',',''), PATINDEX('%[0-9.-]%', replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',','')), 8000), PATINDEX('%[^0-9.-]%', SUBSTRING(replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',',''), PATINDEX('%[0-9.-]%', replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',','')), 8000) + 'X') -1),'.','') from @table 

这是一个从string中提取所有数字的版本; 即因为I'm 35 years old; I was born in 1982. The average family has 2.4 children. I'm 35 years old; I was born in 1982. The average family has 2.4 children. 这将返回35198224 。 也就是说,你已经有了可能已经被格式化为代码的数字数据(例如#123,456,789 123-00005 / 123-00005 ),但是如果你想要提取特定数字(例如数字/只是数字字符)从文本。 也只能处理数字; 所以不会返回负号( - )或句号. )。

 declare @table table (id bigint not null identity (1,1), data nvarchar(max)) insert @table (data) values ('hello 123 its 45613 then') --outputs: 12345613 ,('1 some other string 98 example 4') --outputs: 1984 ,('AB ABCDE # 123') --outputs: 123 ,('ABCDE# 123') --outputs: 123 ,('AB: ABC# 123') --outputs: 123 ; with NonNumerics as ( select id , data original --the below line replaces all digits with blanks , replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(data,'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9','') nonNumeric from @table ) --each iteration of the below CTE removes another non-numeric character from the original string, putting the result into the numerics column , Numerics as ( select id , replace(original, substring(nonNumeric,1,1), '') numerics , replace(nonNumeric, substring(nonNumeric,1,1), '') charsToreplace , len(replace(nonNumeric, substring(nonNumeric,1,1), '')) charsRemaining from NonNumerics union all select id , replace(numerics, substring(charsToreplace,1,1), '') numerics , replace(charsToreplace, substring(charsToreplace,1,1), '') charsToreplace , len(replace(charsToreplace, substring(charsToreplace,1,1), '')) charsRemaining from Numerics where charsRemaining > 0 ) --we select only those strings with `charsRemaining=0`; ie the rows for which all non-numeric characters have been removed; there should be 1 row returned for every 1 row in the original data set. select * from Numerics where charsRemaining = 0 

这段代码的工作原理是从给定的string中删除所有的数字(即我们想要的字符),将它们replace为空格。 然后通过原始string(包括数字)去除所有剩下的字符(即非数字字符),从而只留下数字。

我们这样做的原因是两个步骤,而不是仅仅删除所有非数字字符,只有10个数字,而有很多可能的字符; 所以更换小单比较快; 然后给我们一个实际存在于string中的非数字字符的列表,所以我们可以replace那个小的集合。

该方法使用recursionSQL,使用通用表expression式(CTE)。

我已经为此创build了一个函数

 Create FUNCTION RemoveCharacters (@text varchar(30)) RETURNS VARCHAR(30) AS BEGIN declare @index as int declare @newtexval as varchar(30) set @index = (select PATINDEX('%[AZ.-/?]%', @text)) if (@index =0) begin return @text end else begin set @newtexval = (select STUFF ( @text , @index , 1 , '' )) return dbo.RemoveCharacters(@newtexval) end return 0 END GO 

答案是:

 DECLARE @t TABLE (tVal VARCHAR(100)) INSERT INTO @t VALUES('123') INSERT INTO @t VALUES('123S') INSERT INTO @t VALUES('A123,123') INSERT INTO @t VALUES('a123..A123') ;WITH cte (original, tVal, n) AS ( SELECT t.tVal AS original, LOWER(t.tVal) AS tVal, 65 AS n FROM @t AS t UNION ALL SELECT tVal AS original, CAST(REPLACE(LOWER(tVal), LOWER(CHAR(n)), '') AS VARCHAR(100)), n + 1 FROM cte WHERE n <= 90 ) SELECT t1.tVal AS OldVal, t.tval AS NewVal FROM ( SELECT original, tVal, ROW_NUMBER() OVER(PARTITION BY tVal + original ORDER BY original) AS Sl FROM cte WHERE PATINDEX('%[az]%', tVal) = 0 ) t INNER JOIN @t t1 ON t.original = t1.tVal WHERE t.sl = 1 
  create function fn_GetNumbersOnly(@pn varchar(100)) Returns varchar(max) AS BEGIN Declare @r varchar(max) ='', @len int ,@c char(1), @x int = 0 Select @len = len(@pn) while @x <= @len begin Select @c = SUBSTRING(@pn,@x,1) if ISNUMERIC(@c) = 1 and @c <> '-' Select @r = @r + @c Select @x = @x +1 end return @r