阅读Excel Open XML将忽略空白单元格

我在这里使用公认的解决scheme将Excel表转换为数据表。 这工作正常,如果我有“完美”的数据,但如果我有一个空白单元格在我的数据中间似乎把错误的数据在每一列。

我想这是因为在下面的代码中:

row.Descendants<Cell>().Count() 

是填充的单元格的数量(不是所有列)AND:

 GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i)); 

似乎find下一个填充的单元格(不一定是在那个索引中),所以如果第一列是空的,我调用ElementAt(0),它返回第二列中的值。

这是完整的parsing代码。

 DataRow tempRow = dt.NewRow(); for (int i = 0; i < row.Descendants<Cell>().Count(); i++) { tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i)); if (tempRow[i].ToString().IndexOf("Latency issues in") > -1) { Console.Write(tempRow[i].ToString()); } } 

这是有道理的,因为Excel不会为空的单元格存储值。 如果您使用Open XML SDK 2.0生产力工具打开文件,并将XML向下遍历到单元级别,则将看到只有具有数据的单元将在该文件中。

您的选项是将空白数据插入您将要遍历的单元格范围中,或者通过编程方式找出单元格被跳过,并适当调整索引。

我使用单元格引用A1和C1中的string做了一个excel文档示例。 然后,我打开了Open XML Productivity Tool中的excel文档,这里是存储的XML:

 <x:row r="1" spans="1:3" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main"> <x:cr="A1" t="s"> <x:v>0</x:v> </x:c> <x:cr="C1" t="s"> <x:v>1</x:v> </x:c> </x:row> 

在这里您将看到数据对应于第一行,并且该行只保存了两个单元格的数据。 保存的数据对应于A1和C1,没有保存空值的单元格。

为了获得所需的function,您可以像上面那样遍历单元格,但是您需要检查单元格引用的值,并确定是否有任何单元格被跳过。 要做到这一点,您将需要两个实用工具函数来从单元格引用中获取列名称,然后将该列名称转换为基于零的索引:

  private static List<char> Letters = new List<char>() { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' }; /// <summary> /// Given a cell name, parses the specified cell to get the column name. /// </summary> /// <param name="cellReference">Address of the cell (ie. B2)</param> /// <returns>Column Name (ie. B)</returns> public static string GetColumnName(string cellReference) { // Create a regular expression to match the column name portion of the cell name. Regex regex = new Regex("[A-Za-z]+"); Match match = regex.Match(cellReference); return match.Value; } /// <summary> /// Given just the column name (no row index), it will return the zero based column index. /// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ). /// A length of three can be implemented when needed. /// </summary> /// <param name="columnName">Column Name (ie. A or AB)</param> /// <returns>Zero based index if the conversion was successful; otherwise null</returns> public static int? GetColumnIndexFromName(string columnName) { int? columnIndex = null; string[] colLetters = Regex.Split(columnName, "([AZ]+)"); colLetters = colLetters.Where(s => !string.IsNullOrEmpty(s)).ToArray(); if (colLetters.Count() <= 2) { int index = 0; foreach (string col in colLetters) { List<char> col1 = colLetters.ElementAt(index).ToCharArray().ToList(); int? indexValue = Letters.IndexOf(col1.ElementAt(index)); if (indexValue != -1) { // The first letter of a two digit column needs some extra calculations if (index == 0 && colLetters.Count() == 2) { columnIndex = columnIndex == null ? (indexValue + 1) * 26 : columnIndex + ((indexValue + 1) * 26); } else { columnIndex = columnIndex == null ? indexValue : columnIndex + indexValue; } } index++; } } return columnIndex; } 

然后,您可以遍历单元格,并检查单元格引用与columnIndex进行比较的内容。 如果它小于那么您将空白数据添加到您的tempRow,否则只读取单元格中包含的值。 (注:我没有testing下面的代码,但总的想法应该有所帮助):

 DataRow tempRow = dt.NewRow(); int columnIndex = 0; foreach (Cell cell in row.Descendants<Cell>()) { // Gets the column index of the cell with data int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference)); if (columnIndex < cellColumnIndex) { do { tempRow[columnIndex] = //Insert blank data here; columnIndex++; } while(columnIndex < cellColumnIndex); } tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell); if (tempRow[i].ToString().IndexOf("Latency issues in") > -1) { Console.Write(tempRow[i].ToString()); } columnIndex++; } 

这是IEnumerable的一个实现,应该做你想要的,编译和unit testing。

  ///<summary>returns an empty cell when a blank cell is encountered ///</summary> public IEnumerator<Cell> GetEnumerator() { int currentCount = 0; // row is a class level variable representing the current // DocumentFormat.OpenXml.Spreadsheet.Row foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in row.Descendants<DocumentFormat.OpenXml.Spreadsheet.Cell>()) { string columnName = GetColumnName(cell.CellReference); int currentColumnIndex = ConvertColumnNameToNumber(columnName); for ( ; currentCount < currentColumnIndex; currentCount++) { yield return new DocumentFormat.OpenXml.Spreadsheet.Cell(); } yield return cell; currentCount++; } } 

以下是它所依赖的function:

  /// <summary> /// Given a cell name, parses the specified cell to get the column name. /// </summary> /// <param name="cellReference">Address of the cell (ie. B2)</param> /// <returns>Column Name (ie. B)</returns> public static string GetColumnName(string cellReference) { // Match the column name portion of the cell name. Regex regex = new Regex("[A-Za-z]+"); Match match = regex.Match(cellReference); return match.Value; } /// <summary> /// Given just the column name (no row index), /// it will return the zero based column index. /// </summary> /// <param name="columnName">Column Name (ie. A or AB)</param> /// <returns>Zero based index if the conversion was successful</returns> /// <exception cref="ArgumentException">thrown if the given string /// contains characters other than uppercase letters</exception> public static int ConvertColumnNameToNumber(string columnName) { Regex alpha = new Regex("^[AZ]+$"); if (!alpha.IsMatch(columnName)) throw new ArgumentException(); char[] colLetters = columnName.ToCharArray(); Array.Reverse(colLetters); int convertedValue = 0; for (int i = 0; i < colLetters.Length; i++) { char letter = colLetters[i]; int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65 convertedValue += current * (int)Math.Pow(26, i); } return convertedValue; } 

把它扔在课堂上,试试看。

Waylon的答案略有修改,也依赖于其他答案。 它把他的方法封装在一个类中。

我变了

 IEnumerator<Cell> GetEnumerator() 

 IEnumerable<Cell> GetRowCells(Row row) 

这里是类,你不需要实例化它,它只是作为一个工具类:

 public class SpreedsheetHelper { ///<summary>returns an empty cell when a blank cell is encountered ///</summary> public static IEnumerable<Cell> GetRowCells(Row row) { int currentCount = 0; foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in row.Descendants<DocumentFormat.OpenXml.Spreadsheet.Cell>()) { string columnName = GetColumnName(cell.CellReference); int currentColumnIndex = ConvertColumnNameToNumber(columnName); for (; currentCount < currentColumnIndex; currentCount++) { yield return new DocumentFormat.OpenXml.Spreadsheet.Cell(); } yield return cell; currentCount++; } } /// <summary> /// Given a cell name, parses the specified cell to get the column name. /// </summary> /// <param name="cellReference">Address of the cell (ie. B2)</param> /// <returns>Column Name (ie. B)</returns> public static string GetColumnName(string cellReference) { // Match the column name portion of the cell name. var regex = new System.Text.RegularExpressions.Regex("[A-Za-z]+"); var match = regex.Match(cellReference); return match.Value; } /// <summary> /// Given just the column name (no row index), /// it will return the zero based column index. /// </summary> /// <param name="columnName">Column Name (ie. A or AB)</param> /// <returns>Zero based index if the conversion was successful</returns> /// <exception cref="ArgumentException">thrown if the given string /// contains characters other than uppercase letters</exception> public static int ConvertColumnNameToNumber(string columnName) { var alpha = new System.Text.RegularExpressions.Regex("^[AZ]+$"); if (!alpha.IsMatch(columnName)) throw new ArgumentException(); char[] colLetters = columnName.ToCharArray(); Array.Reverse(colLetters); int convertedValue = 0; for (int i = 0; i < colLetters.Length; i++) { char letter = colLetters[i]; int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65 convertedValue += current * (int)Math.Pow(26, i); } return convertedValue; } } 

现在,您可以通过这种方式获取所有行的单元格:

 // skip the part that retrieves the worksheet sheetData IEnumerable<Row> rows = sheetData.Descendants<Row>(); foreach(Row row in rows) { IEnumerable<Cell> cells = SpreedsheetHelper.GetRowCells(row); foreach (Cell cell in cells) { // skip part that reads the text according to the cell-type } } 

它将包含所有的细胞,即使它们是空的。

看我的实现:

  Row[] rows = worksheet.GetFirstChild<SheetData>() .Elements<Row>() .ToArray(); string[] columnNames = rows.First() .Elements<Cell>() .Select(cell => GetCellValue(cell, document)) .ToArray(); HeaderLetters = ExcelHeaderHelper.GetHeaderLetters((uint)columnNames.Count()); if (columnNames.Count() != HeaderLetters.Count()) { throw new ArgumentException("HeaderLetters"); } IEnumerable<List<string>> cellValues = GetCellValues(rows.Skip(1), columnNames.Count(), document); //Here you can enumerate through the cell values, based on the cell index the column names can be retrieved. 

HeaderLetters使用这个类来收集:

  private static class ExcelHeaderHelper { public static string[] GetHeaderLetters(uint max) { var result = new List<string>(); int i = 0; var columnPrefix = new Queue<string>(); string prefix = null; int prevRoundNo = 0; uint maxPrefix = max / 26; while (i < max) { int roundNo = i / 26; if (prevRoundNo < roundNo) { prefix = columnPrefix.Dequeue(); prevRoundNo = roundNo; } string item = prefix + ((char)(65 + (i % 26))).ToString(CultureInfo.InvariantCulture); if (i <= maxPrefix) { columnPrefix.Enqueue(item); } result.Add(item); i++; } return result.ToArray(); } } 

辅助方法是:

  private static IEnumerable<List<string>> GetCellValues(IEnumerable<Row> rows, int columnCount, SpreadsheetDocument document) { var result = new List<List<string>>(); foreach (var row in rows) { List<string> cellValues = new List<string>(); var actualCells = row.Elements<Cell>().ToArray(); int j = 0; for (int i = 0; i < columnCount; i++) { if (actualCells.Count() <= j || !actualCells[j].CellReference.ToString().StartsWith(HeaderLetters[i])) { cellValues.Add(null); } else { cellValues.Add(GetCellValue(actualCells[j], document)); j++; } } result.Add(cellValues); } return result; } private static string GetCellValue(Cell cell, SpreadsheetDocument document) { bool sstIndexedcell = GetCellType(cell); return sstIndexedcell ? GetSharedStringItemById(document.WorkbookPart, Convert.ToInt32(cell.InnerText)) : cell.InnerText; } private static bool GetCellType(Cell cell) { return cell.DataType != null && cell.DataType == CellValues.SharedString; } private static string GetSharedStringItemById(WorkbookPart workbookPart, int id) { return workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(id).InnerText; } 

该解决scheme处理共享单元格项目(SST索引单元格)。

所有的好例子。 这是我正在使用的,因为我需要跟踪所有行,单元格,值和标题进行关联和分析。

ReadSpreadsheet方法打开一个xlxs文件并遍历每个工作表,行和列。 由于值存储在一个引用的string表中,我也明确地使用每个工作表。 还有其他类使用:DSFunction和StaticVariables。 后者保留了使用的参数值,如引用的“quotdouble”(quotdouble =“\ u0022”;)和“crlf”(crlf =“\ u000D”+“\ u000A”;)。

相关的DSFunction方法GetIntColIndexForLetter包含在下面。 它返回一个与字母名称对应的列索引的整数值,如(A,B,AA,ADE等)。 这与参数'ncellcolref'一起使用,以确定是否有任何列已被跳过,并为每个缺失的列input空string值。

在临时存储在一个List对象(使用Replace方法)之前,我还要对值进行一些清理。

随后,我使用列名称的散列表(字典)来提取不同工作表中的值,将它们关联起来,创build标准化的值,然后创build一个在我们的产品中使用的对象,然后将其存储为XML文件。 这些都没有显示,但是为什么使用这种方法。

  public static class DSFunction { /// <summary> /// Creates an integer value for a column letter name starting at 1 for 'a' /// </summary> /// <param name="lettstr">Column name as letters</param> /// <returns>int value</returns> public static int GetIntColIndexForLetter(string lettstr) { string txt = "", txt1=""; int n1, result = 0, nbeg=-1, nitem=0; try { nbeg = (int)("a".ToCharArray()[0]) - 1; //1 based txt = lettstr; if (txt != "") txt = txt.ToLower().Trim(); while (txt != "") { if (txt.Length > 1) { txt1 = txt.Substring(0, 1); txt = txt.Substring(1); } else { txt1 = txt; txt = ""; } if (!DSFunction.IsNumberString(txt1, "real")) { nitem++; n1 = (int)(txt1.ToCharArray()[0]) - nbeg; result += n1 + (nitem - 1) * 26; } else { break; } } } catch (Exception ex) { txt = ex.Message; } return result; } } public static class Extractor { public static string ReadSpreadsheet(string fileUri) { string msg = "", txt = "", txt1 = ""; int i, n1, n2, nrow = -1, ncell = -1, ncellcolref = -1; Boolean haveheader = true; Dictionary<string, int> hashcolnames = new Dictionary<string, int>(); List<string> colvalues = new List<string>(); try { if (!File.Exists(fileUri)) { throw new Exception("file does not exist"); } using (SpreadsheetDocument ssdoc = SpreadsheetDocument.Open(fileUri, true)) { var stringTable = ssdoc.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault(); foreach (Sheet sht in ssdoc.WorkbookPart.Workbook.Descendants<Sheet>()) { nrow = 0; foreach (Row ssrow in ((WorksheetPart)(ssdoc.WorkbookPart.GetPartById(sht.Id))).Worksheet.Descendants<Row>()) { ncell = 0; ncellcolref = 0; nrow++; colvalues.Clear(); foreach (Cell sscell in ssrow.Elements<Cell>()) { ncell++; n1 = DSFunction.GetIntColIndexForLetter(sscell.CellReference); for (i = 0; i < (n1 - ncellcolref - 1); i++) { if (nrow == 1 && haveheader) { txt1 = "-missing" + (ncellcolref + 1 + i).ToString() + "-"; if (!hashcolnames.TryGetValue(txt1, out n2)) { hashcolnames.Add(txt1, ncell - 1); } } else { colvalues.Add(""); } } ncellcolref = n1; if (sscell.DataType != null) { if (sscell.DataType.Value == CellValues.SharedString && stringTable != null) { txt = stringTable.SharedStringTable.ElementAt(int.Parse(sscell.InnerText)).InnerText; } else if (sscell.DataType.Value == CellValues.String) { txt = sscell.InnerText; } else txt = sscell.InnerText.ToString(); } else txt = sscell.InnerText; if (txt != "") txt1 = txt.ToLower().Trim(); else txt1 = ""; if (nrow == 1 && haveheader) { txt1 = txt1.Replace(" ", ""); if (txt1 == "table/viewname") txt1 = "tablename"; else if (txt1 == "schemaownername") txt1 = "schemaowner"; else if (txt1 == "subjectareaname") txt1 = "subjectarea"; else if (txt1.StartsWith("column")) { txt1 = txt1.Substring("column".Length); } if (!hashcolnames.TryGetValue(txt1, out n1)) { hashcolnames.Add(txt1, ncell - 1); } } else { txt = txt.Replace(((char)8220).ToString(), "'"); //special " txt = txt.Replace(((char)8221).ToString(), "'"); //special " txt = txt.Replace(StaticVariables.quotdouble, "'"); txt = txt.Replace(StaticVariables.crlf, " "); txt = txt.Replace(" ", " "); txt = txt.Replace("<", ""); txt = txt.Replace(">", ""); colvalues.Add(txt); } } } } } } catch (Exception ex) { msg = "notok:" + ex.Message; } return msg; } } 

字母代码是一个基地26编码,所以这应该工作将其转换成偏移量。

 // Converts letter code (ie AA) to an offset public int offset( string code) { var offset = 0; var byte_array = Encoding.ASCII.GetBytes( code ).Reverse().ToArray(); for( var i = 0; i < byte_array.Length; i++ ) { offset += (byte_array[i] - 65 + 1) * Convert.ToInt32(Math.Pow(26.0, Convert.ToDouble(i))); } return offset - 1; } 

您可以使用此函数从通过标头索引的行提取单元格:

 public static Cell GetCellFromRow(Row r ,int headerIdx) { string cellname = GetNthColumnName(headerIdx) + r.RowIndex.ToString(); IEnumerable<Cell> cells = r.Elements<Cell>().Where(x=> x.CellReference == cellname); if (cells.Count() > 0) { return cells.First(); } else { return null; } } public static string GetNthColumnName(int n) { string name = ""; while (n > 0) { n--; name = (char)('A' + n % 26) + name; n /= 26; } return name; } 

好吧,我不是这方面的专家,但其他答案似乎是杀了我,所以这里是我的解决scheme:

 // Loop through each row in the spreadsheet, skipping the header row foreach (var row in sheetData.Elements<Row>().Skip(1)) { var i = 0; string[] letters = new string[15] {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O" }; List<String> cellsList = new List<string>(); foreach (var cell in row.Elements<Cell>().ToArray()) { while (cell.CellReference.ToString()[0] != Convert.ToChar(letters[i])) {//accounts for multiple consecutive blank cells cellsList.Add(""); i++; } cellsList.Add(cell.CellValue.Text); i++; } string[] cells = cellsList.ToArray(); foreach(var cell in cellsList) { //display contents of cell, depending on the datatype you may need to call each of the cells manually } } 

希望有人认为这有用!

我不能拒绝从Amurra的回答中去掉优化子程序来消除对正则expression式的需求。

第一个函数实际上并不需要,因为第二个函数可以接受单元格引用(C3)或列名称(C)(但仍然是一个很好的帮助函数)。 这些索引也是基于一个的(仅仅是因为我们的实现使用了一个基于行的方式来与Excel进行直观匹配)。

  /// <summary> /// Given a cell name, return the cell column name. /// </summary> /// <param name="cellReference">Address of the cell (ie. B2)</param> /// <returns>Column Name (ie. B)</returns> /// <exception cref="ArgumentOutOfRangeException">cellReference</exception> public static string GetColumnName(string cellReference) { // Advance from L to R until a number, then return 0 through previous position // for (int lastCharPos = 0; lastCharPos <= 3; lastCharPos++) if (Char.IsNumber(cellReference[lastCharPos])) return cellReference.Substring(0, lastCharPos); throw new ArgumentOutOfRangeException("cellReference"); } /// <summary> /// Return one-based column index given a cell name or column name /// </summary> /// <param name="columnNameOrCellReference">Column Name (ie. A, AB3, or AB44)</param> /// <returns>One based index if the conversion was successful; otherwise null</returns> public static int GetColumnIndexFromName(string columnNameOrCellReference) { int columnIndex = 0; int factor = 1; for (int pos = columnNameOrCellReference.Length - 1; pos >= 0; pos--) // R to L { if (Char.IsLetter(columnNameOrCellReference[pos])) // for letters (columnName) { columnIndex += factor * ((columnNameOrCellReference[pos] - 'A') + 1); factor *= 26; } } return columnIndex; } 

这是我的解决scheme。 我发现上面似乎没有工作的时候,在一行的末尾丢失的字段。

假设Excel工作表中的第一行具有所有列(通过标题),然后获取每行预期的列数(行== 1)。 然后遍历数据行(行> 1)。 处理丢失的单元格的关键在getRowCells方法中,其中传入已知数量的列单元格以及要处理的当前行。

 int columnCount = worksheetPart.Worksheet.Descendants<Row>().Where(r => r.RowIndex == 1).FirstOrDefault().Descendants<Cell>().Count(); IEnumerable<Row> rows = worksheetPart.Worksheet.Descendants<Row>().Where(r => r.RowIndex > 1); List<List<string>> docData = new List<List<string>>(); foreach (Row row in rows) { List<Cell> cells = getRowCells(columnCount, row); List<string> rowData = new List<string>(); foreach (Cell cell in cells) { rowData.Add(getCellValue(workbookPart, cell)); } docData.Add(rowData); } 

方法getRowCells有一个当前的限制,只能够支持less于26列的表(行)。 基于已知列数的循环用于查找缺失的列(单元格)。 如果find,则将新Cell值插入到Cells集合中,新Cell的默认值为“”,而不是“null”。 然后返回修改的Cell集合。

 private static List<Cell> getRowCells(int columnCount, Row row) { const string COLUMN_LETTERS = "ABCDEFHIJKLMNOPQRSTUVWXYZ"; if (columnCount > COLUMN_LETTERS.Length) { throw new ArgumentException(string.Format("Invalid columnCount ({0}). Cannot be greater than {1}", columnCount, COLUMN_LETTERS.Length)); } List<Cell> cells = row.Descendants<Cell>().ToList(); for (int i = 0; i < columnCount; i++) { if (i < cells.Count) { string cellColumnReference = cells.ElementAt(i).CellReference.ToString(); if (cellColumnReference[0] != COLUMN_LETTERS[i]) { cells.Insert(i, new Cell() { CellValue = new CellValue("") }); } } else { cells.Insert(i, new Cell() { CellValue = new CellValue("") }); } } return cells; } private static string getCellValue(WorkbookPart workbookPart, Cell cell) { SharedStringTablePart stringTablePart = workbookPart.SharedStringTablePart; string value = (cell.CellValue != null) ? cell.CellValue.InnerXml : string.Empty; if ((cell.DataType != null) && (cell.DataType.Value == CellValues.SharedString)) { return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText; } else { return value; } } 

为了读取空白单元格,我使用了一个名为“CN”的行外读取器和while循环,我检查列索引是否大于或不是从我的variables,因为它是在每个单元格读后递增。 如果这不匹配,我填写我想要的值。 这是我用来赶上空白单元格到我尊重列值的技巧。 这里是代码:

 public static DataTable ReadIntoDatatableFromExcel(string newFilePath) { /*Creating a table with 20 columns*/ var dt = CreateProviderRvenueSharingTable(); try { /*using stream so that if excel file is in another process then it can read without error*/ using (Stream stream = new FileStream(newFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)) { using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(stream, false)) { var workbookPart = spreadsheetDocument.WorkbookPart; var workbook = workbookPart.Workbook; /*get only unhide tabs*/ var sheets = workbook.Descendants<Sheet>().Where(e => e.State == null); foreach (var sheet in sheets) { var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id); /*Remove empty sheets*/ List<Row> rows = worksheetPart.Worksheet.Elements<SheetData>().First().Elements<Row>() .Where(r => r.InnerText != string.Empty).ToList(); if (rows.Count > 1) { OpenXmlReader reader = OpenXmlReader.Create(worksheetPart); int i = 0; int BTR = 0;/*Break the reader while empty rows are found*/ while (reader.Read()) { if (reader.ElementType == typeof(Row)) { /*ignoring first row with headers and check if data is there after header*/ if (i < 2) { i++; continue; } reader.ReadFirstChild(); DataRow row = dt.NewRow(); int CN = 0; if (reader.ElementType == typeof(Cell)) { do { Cell c = (Cell)reader.LoadCurrentElement(); /*reader skipping blank cells so data is getting worng in datatable's rows according to header*/ if (CN != 0) { int cellColumnIndex = ExcelHelper.GetColumnIndexFromName( ExcelHelper.GetColumnName(c.CellReference)); if (cellColumnIndex < 20 && CN < cellColumnIndex - 1) { do { row[CN] = string.Empty; CN++; } while (CN < cellColumnIndex - 1); } } /*stopping execution if first cell does not have any value which means empty row*/ if (CN == 0 && c.DataType == null && c.CellValue == null) { BTR++; break; } string cellValue = GetCellValue(c, workbookPart); row[CN] = cellValue; CN++; /*if any text exists after T column (index 20) then skip the reader*/ if (CN == 20) { break; } } while (reader.ReadNextSibling()); } /*reader skipping blank cells so fill the array upto 19 index*/ while (CN != 0 && CN < 20) { row[CN] = string.Empty; CN++; } if (CN == 20) { dt.Rows.Add(row); } } /*escaping empty rows below data filled rows after checking 5 times */ if (BTR > 5) break; } reader.Close(); } } } } } catch (Exception ex) { throw ex; } return dt; } private static string GetCellValue(Cell c, WorkbookPart workbookPart) { string cellValue = string.Empty; if (c.DataType != null && c.DataType == CellValues.SharedString) { SharedStringItem ssi = workbookPart.SharedStringTablePart.SharedStringTable .Elements<SharedStringItem>() .ElementAt(int.Parse(c.CellValue.InnerText)); if (ssi.Text != null) { cellValue = ssi.Text.Text; } } else { if (c.CellValue != null) { cellValue = c.CellValue.InnerText; } } return cellValue; } public static int GetColumnIndexFromName(string columnNameOrCellReference) { int columnIndex = 0; int factor = 1; for (int pos = columnNameOrCellReference.Length - 1; pos >= 0; pos--) // R to L { if (Char.IsLetter(columnNameOrCellReference[pos])) // for letters (columnName) { columnIndex += factor * ((columnNameOrCellReference[pos] - 'A') + 1); factor *= 26; } } return columnIndex; } public static string GetColumnName(string cellReference) { /* Advance from L to R until a number, then return 0 through previous position*/ for (int lastCharPos = 0; lastCharPos <= 3; lastCharPos++) if (Char.IsNumber(cellReference[lastCharPos])) return cellReference.Substring(0, lastCharPos); throw new ArgumentOutOfRangeException("cellReference"); } 

代码适用于:

  1. 该代码读取空白单元格
  2. 阅读完成后跳过空行。
  3. 首先从升序读取表格
  4. 如果excel文件正在被另一个进程所访问,那么OpenXML仍然会读取该文件。

抱歉发布这个问题的另一个答案,这里是我使用的代码。

如果工作表顶部有一个空白行,那么我遇到了OpenXML无法正常工作的问题。 它有时会返回一个DataTable,其中有0行和0列。 下面的代码可以解决这个问题,以及所有其他的工作表。

这是你如何打电话给我的代码。 只需传入文件名和工作表的名称即可:

 DataTable dt = OpenXMLHelper.ExcelWorksheetToDataTable("C:\\SQL Server\\SomeExcelFile.xlsx", "Mikes Worksheet"); 

这里是代码本身:

  public class OpenXMLHelper { // A helper function to open an Excel file using OpenXML, and return a DataTable containing all the data from one // of the worksheets. // // We've had lots of problems reading in Excel data using OLEDB (eg the ACE drivers no longer being present on new servers, // OLEDB not working due to security issues, and blatantly ignoring blank rows at the top of worksheets), so this is a more // stable method of reading in the data. // public static DataTable ExcelWorksheetToDataTable(string pathFilename, string worksheetName) { DataTable dt = new DataTable(worksheetName); using (SpreadsheetDocument document = SpreadsheetDocument.Open(pathFilename, false)) { // Find the sheet with the supplied name, and then use that // Sheet object to retrieve a reference to the first worksheet. Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault(); if (theSheet == null) throw new Exception("Couldn't find the worksheet: " + worksheetName); // Retrieve a reference to the worksheet part. WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id)); Worksheet workSheet = wsPart.Worksheet; string dimensions = workSheet.SheetDimension.Reference.InnerText; // Get the dimensions of this worksheet, eg "B2:F4" int numOfColumns = 0; int numOfRows = 0; CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows); System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows)); SheetData sheetData = workSheet.GetFirstChild<SheetData>(); IEnumerable<Row> rows = sheetData.Descendants<Row>(); string[,] cellValues = new string[numOfColumns, numOfRows]; int colInx = 0; int rowInx = 0; string value = ""; SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart; // Iterate through each row of OpenXML data, and store each cell's value in the appropriate slot in our [,] string array. foreach (Row row in rows) { for (int i = 0; i < row.Descendants<Cell>().Count(); i++) { // *DON'T* assume there's going to be one XML element for each column in each row... Cell cell = row.Descendants<Cell>().ElementAt(i); if (cell.CellValue == null || cell.CellReference == null) continue; // eg when an Excel cell contains a blank string // Convert this Excel cell's CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12]) colInx = GetColumnIndexByName(cell.CellReference); // eg "C" -> 2 (0-based) rowInx = GetRowIndexFromCellAddress(cell.CellReference)-1; // Needs to be 0-based // Fetch the value in this cell value = cell.CellValue.InnerXml; if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString) { value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText; } cellValues[colInx, rowInx] = value; } } // Copy the array of strings into a DataTable. // We don't (currently) make any attempt to work out which columns should be numeric, rather than string. for (int col = 0; col < numOfColumns; col++) dt.Columns.Add("Column_" + col.ToString()); for (int row = 0; row < numOfRows; row++) { DataRow dataRow = dt.NewRow(); for (int col = 0; col < numOfColumns; col++) { dataRow.SetField(col, cellValues[col, row]); } dt.Rows.Add(dataRow); } #if DEBUG // Write out the contents of our DataTable to the Output window (for debugging) string str = ""; for (rowInx = 0; rowInx < maxNumOfRows; rowInx++) { for (colInx = 0; colInx < maxNumOfColumns; colInx++) { object val = dt.Rows[rowInx].ItemArray[colInx]; str += (val == null) ? "" : val.ToString(); str += "\t"; } str += "\n"; } System.Diagnostics.Trace.WriteLine(str); #endif return dt; } } private static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows) { // How many columns & rows of data does this Worksheet contain ? // We'll read in the Dimensions string from the Excel file, and calculate the size based on that. // eg "B1:F4" -> we'll need 6 columns and 4 rows. // // (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.) try { string[] parts = dimensions.Split(':'); // eg "B1:F4" if (parts.Length != 2) throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension"); numOfColumns = 1 + GetColumnIndexByName(parts[1]); // A=1, B=2, C=3 (1-based value), so F4 would return 6 columns numOfRows = GetRowIndexFromCellAddress(parts[1]); } catch { throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions); } } public static int GetRowIndexFromCellAddress(string cellAddress) { // Convert an Excel CellReference column into a 1-based row index // eg "D42" -> 42 // "F123" -> 123 string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", ""); return int.Parse(rowNumber); } public static int GetColumnIndexByName(string cellAddress) { // Convert an Excel CellReference column into a 0-based column index // eg "D42" -> 3 // "F123" -> 5 var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", ""); int number = 0, pow = 1; for (int i = columnName.Length - 1; i >= 0; i--) { number += (columnName[i] - 'A' + 1) * pow; pow *= 26; } return number - 1; } } 

Added yet another implementation, this time where the number of columns is known in advance:

  /// <summary> /// Gets a list cells that are padded with empty cells where necessary. /// </summary> /// <param name="numberOfColumns">The number of columns expected.</param> /// <param name="cells">The cells.</param> /// <returns>List of padded cells</returns> private static IList<Cell> GetPaddedCells(int numberOfColumns, IList<Cell> cells) { // Only perform the padding operation if existing column count is less than required if (cells.Count < numberOfColumns - 1) { IList<Cell> padded = new List<Cell>(); int cellIndex = 0; for (int paddedIndex = 0; paddedIndex < numberOfColumns; paddedIndex++) { if (cellIndex < cells.Count) { // Grab column reference (ignore row) <seealso cref="https://stackoverflow.com/a/7316298/674776"/> string columnReference = new string(cells[cellIndex].CellReference.ToString().Where(char.IsLetter).ToArray()); // Convert reference to index <seealso cref="https://stackoverflow.com/a/848552/674776"/> int indexOfReference = columnReference.ToUpper().Aggregate(0, (column, letter) => (26 * column) + letter - 'A' + 1) - 1; // Add padding cells where current cell index is less than required while (indexOfReference > paddedIndex) { padded.Add(new Cell()); paddedIndex++; } padded.Add(cells[cellIndex++]); } else { // Add padding cells when passed existing cells padded.Add(new Cell()); } } return padded; } else { return cells; } } 

Call using:

 IList<Cell> cells = GetPaddedCells(38, row.Descendants<Cell>().ToList()); 

Where 38 is the required number of columns.