在观测夏令时的同时parsing本地时间(到UTC)的有序时间戳

我有CSV数据文件与本地时间的时间戳logging。 不幸的是,数据文件覆盖了夏令时间变化的时间段(2013年11月3日),因此logging时间戳的时间分量为: 12:45, 1:00, 1:15, 1:30, 1:45, 1:00, 1:15, 1:30, 1:45, 2:00 : 12:45, 1:00, 1:15, 1:30, 1:45, 1:00, 1:15, 1:30, 1:45, 2:0012:45, 1:00, 1:15, 1:30, 1:45, 1:00, 1:15, 1:30, 1:45, 2:00 : 12:45, 1:00, 1:15, 1:30, 1:45, 1:00, 1:15, 1:30, 1:45, 2:0012:45, 1:00, 1:15, 1:30, 1:45, 1:00, 1:15, 1:30, 1:45, 2:00 我希望能够转换和存储在数据库中的值为UTC。

不幸的是.NET的标准DateTime.Parse()函数将会像这样(全部2013年11月3日)parsing:

 | Time String | Parsed Local Time | In DST | Parsed Local Time to UTC | 12:45 am | 12:45 am | Yes | 4:45 am | 12:59:59 am | 12:59:59 am | Yes | 4:59:59 am | 01:00 am | 1:00 am | No | 6:00 am | 01:15 am | 1:15 am | No | 6:15 am 

所以它从来没有将1:00-1:59:59 am范围视为DST,而我在UTC中parsing的时间戳会跳转一个小时。

有没有一个图书馆或课堂可以让我parsing时间戳,并考虑到DST的变化? 像一些实例化的类,它会记住它已经收到的时间戳stream,并相应地调整parsing的时间戳?

parsing时可以做的数据假设:

  1. 我在文件的标题部分有local和UTC两个文件的开始时间(第一个logging的时间戳)。
  2. logging按时间戳顺序排列
  3. 所有当地时间都在东部标准
  4. 数据也可能以另一种方式出现:从DST出来
  5. logging包含以下格式的完整时间戳: yyyy/mm/dd HH:mm:ss2013/11/03 00:45:00

注意:虽然我的软件是用C#编写的,但是我并没有专门标记C#/ .NET,因为我认为如果需要的话,我可以使用任何语言的解决scheme实现和重新编码。

在C#中:

 // Define the input values. string[] input = { "2013-11-03 00:45:00", "2013-11-03 01:00:00", "2013-11-03 01:15:00", "2013-11-03 01:30:00", "2013-11-03 01:45:00", "2013-11-03 01:00:00", "2013-11-03 01:15:00", "2013-11-03 01:30:00", "2013-11-03 01:45:00", "2013-11-03 02:00:00", }; // Get the time zone the input is meant to be interpreted in. TimeZoneInfo tz = TimeZoneInfo.FindSystemTimeZoneById("Eastern Standard Time"); // Create an array for the output values DateTimeOffset[] output = new DateTimeOffset[input.Length]; // Start with the assumption that DST is active, as ambiguities occur when moving // out of daylight time into standard time. bool dst = true; // Iterate through the input. for (int i = 0; i < input.Length; i++) { // Parse the input string as a DateTime with Unspecified kind DateTime dt = DateTime.ParseExact(input[i], "yyyy-MM-dd HH:mm:ss", CultureInfo.InvariantCulture); // Determine the offset. TimeSpan offset; if (tz.IsAmbiguousTime(dt)) { // Get the possible offsets, and use the DST flag and the previous entry // to determine if we are past the transition point. This only works // because we have outside knowledge that the items are in sequence. TimeSpan[] offsets = tz.GetAmbiguousTimeOffsets(dt); offset = dst && (i == 0 || dt >= output[i - 1].DateTime) ? offsets[1] : offsets[0]; } else { // The value is unambiguous, so just get the single offset it can be. offset = tz.GetUtcOffset(dt); } // Use the determined values to construct a DateTimeOffset DateTimeOffset dto = new DateTimeOffset(dt, offset); // We can unambiguously check a DateTimeOffset for daylight saving time, // which sets up the DST flag for the next iteration. dst = tz.IsDaylightSavingTime(dto); // Save the DateTimeOffset to the output array. output[i] = dto; } // Show the output for debugging foreach (var dto in output) { Console.WriteLine("{0:yyyy-MM-dd HH:mm:ss zzzz} => {1:yyyy-MM-dd HH:mm:ss} UTC", dto, dto.UtcDateTime); } 

输出:

 2013-11-03 00:45:00 -04:00 => 2013-11-03 04:45:00 UTC 2013-11-03 01:00:00 -04:00 => 2013-11-03 05:00:00 UTC 2013-11-03 01:15:00 -04:00 => 2013-11-03 05:15:00 UTC 2013-11-03 01:30:00 -04:00 => 2013-11-03 05:30:00 UTC 2013-11-03 01:45:00 -04:00 => 2013-11-03 05:45:00 UTC 2013-11-03 01:00:00 -05:00 => 2013-11-03 06:00:00 UTC 2013-11-03 01:15:00 -05:00 => 2013-11-03 06:15:00 UTC 2013-11-03 01:30:00 -05:00 => 2013-11-03 06:30:00 UTC 2013-11-03 01:45:00 -05:00 => 2013-11-03 06:45:00 UTC 2013-11-03 02:00:00 -05:00 => 2013-11-03 07:00:00 UTC 

请注意,这是假定您第一次遇到像1点那样模糊的时间,它将在DST中。 说你的名单被截断到最后5个条目 – 你不知道那些是在标准时间。 在这种特殊情况下你可以做的事情不多。

如果连续时间戳不能以UTC表示时间,那么这个Python脚本可以将本地时间转换为UTC:

 #!/usr/bin/env python3 import sys from datetime import datetime, timedelta import pytz # $ pip install pytz tz = pytz.timezone('America/New_York' if len(sys.argv) < 2 else sys.argv[1]) previous = None #XXX set it from UTC time: `first_entry_utc.astimezone(tz)` for line in sys.stdin: # read from stdin naive = datetime.strptime(line.strip(), "%Y/%m/%d %H:%M:%S") # no timezone try: local = tz.localize(naive, is_dst=None) # attach timezone info except pytz.AmbiguousTimeError: # assume ambiguous time always corresponds to True -> False transition local = tz.localize(naive, is_dst=True) if previous >= local: # timestamps must be increasing local = tz.localize(naive, is_dst=False) assert previous < local #NOTE: allow NonExistentTimeError to propagate (there shouldn't be # invalid local times in the input) previous = local utc = local.astimezone(pytz.utc) timestamp = utc.timestamp() time_format = "%Y-%m-%d %H:%M:%S %Z%z" print("{local:{time_format}}; {utc:{time_format}}; {timestamp:.0f}" .format_map(vars())) 

input

 2013/11/03 00:45:00 2013/11/03 01:00:00 2013/11/03 01:15:00 2013/11/03 01:30:00 2013/11/03 01:45:00 2013/11/03 01:00:00 2013/11/03 01:15:00 2013/11/03 01:30:00 2013/11/03 01:45:00 2013/11/03 02:00:00 

产量

 2013-11-03 00:45:00 EDT-0400; 2013-11-03 04:45:00 UTC+0000; 1383453900 2013-11-03 01:00:00 EDT-0400; 2013-11-03 05:00:00 UTC+0000; 1383454800 2013-11-03 01:15:00 EDT-0400; 2013-11-03 05:15:00 UTC+0000; 1383455700 2013-11-03 01:30:00 EDT-0400; 2013-11-03 05:30:00 UTC+0000; 1383456600 2013-11-03 01:45:00 EDT-0400; 2013-11-03 05:45:00 UTC+0000; 1383457500 2013-11-03 01:00:00 EST-0500; 2013-11-03 06:00:00 UTC+0000; 1383458400 2013-11-03 01:15:00 EST-0500; 2013-11-03 06:15:00 UTC+0000; 1383459300 2013-11-03 01:30:00 EST-0500; 2013-11-03 06:30:00 UTC+0000; 1383460200 2013-11-03 01:45:00 EST-0500; 2013-11-03 06:45:00 UTC+0000; 1383461100 2013-11-03 02:00:00 EST-0500; 2013-11-03 07:00:00 UTC+0000; 1383462000