如何从XmlNode实例获取xpath

有人可以提供一些代码来获取System.Xml.XmlNode实例的xpath吗?

谢谢!

好吧,我无法抗拒。 它只适用于属性和元素,但是嘿…你能在15分钟内得到什么结果:)同样地,这样做也许是一个更好的方法。

在每个元素(特别是根元素)中包含索引是多余的,但是比试图找出是否存在任何不明确性更容易。

using System; using System.Text; using System.Xml; class Test { static void Main() { string xml = @" <root> <foo /> <foo> <bar attr='value'/> <bar other='va' /> </foo> <foo><bar /></foo> </root>"; XmlDocument doc = new XmlDocument(); doc.LoadXml(xml); XmlNode node = doc.SelectSingleNode("//@attr"); Console.WriteLine(FindXPath(node)); Console.WriteLine(doc.SelectSingleNode(FindXPath(node)) == node); } static string FindXPath(XmlNode node) { StringBuilder builder = new StringBuilder(); while (node != null) { switch (node.NodeType) { case XmlNodeType.Attribute: builder.Insert(0, "/@" + node.Name); node = ((XmlAttribute) node).OwnerElement; break; case XmlNodeType.Element: int index = FindElementIndex((XmlElement) node); builder.Insert(0, "/" + node.Name + "[" + index + "]"); node = node.ParentNode; break; case XmlNodeType.Document: return builder.ToString(); default: throw new ArgumentException("Only elements and attributes are supported"); } } throw new ArgumentException("Node was not in a document"); } static int FindElementIndex(XmlElement element) { XmlNode parentNode = element.ParentNode; if (parentNode is XmlDocument) { return 1; } XmlElement parent = (XmlElement) parentNode; int index = 1; foreach (XmlNode candidate in parent.ChildNodes) { if (candidate is XmlElement && candidate.Name == element.Name) { if (candidate == element) { return index; } index++; } } throw new ArgumentException("Couldn't find element within parent"); } } 

Jon的正确的是,有任何数量的XPathexpression式将产生实例文档中的相同节点。 构build明确生成特定节点的expression式的最简单方法是使用谓词中节点位置的节点testing链,例如:

 /node()[0]/node()[2]/node()[6]/node()[1]/node()[2] 

显然,这个expression式并不是使用元素名称,但是如果你想要做的只是在一个文档中find一个节点,你不需要它的名字。 它也不能用于查找属性(因为属性不是节点并且没有位置;只能通过名称find它们),但是它会查找所有其他节点types。

为了构build这个expression式,你需要编写一个方法来返回一个节点在父节点的子节点中的位置,因为XmlNode并没有把它作为一个属性公开:

 static int GetNodePosition(XmlNode child) { for (int i=0; i<child.ParentNode.ChildNodes.Count; i++) { if (child.ParentNode.ChildNodes[i] == child) { // tricksy XPath, not starting its positions at 0 like a normal language return i + 1; } } throw new InvalidOperationException("Child node somehow not found in its parent's ChildNodes property."); } 

(使用LINQ可能会有一个更好的方法,因为XmlNodeList实现了IEnumerable ,但是我会用我在这里所知道的。

然后你可以写一个像这样的recursion方法:

 static string GetXPathToNode(XmlNode node) { if (node.NodeType == XmlNodeType.Attribute) { // attributes have an OwnerElement, not a ParentNode; also they have // to be matched by name, not found by position return String.Format( "{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name ); } if (node.ParentNode == null) { // the only node with no parent is the root node, which has no path return ""; } // the path to a node is the path to its parent, plus "/node()[n]", where // n is its position among its siblings. return String.Format( "{0}/node()[{1}]", GetXPathToNode(node.ParentNode), GetNodePosition(node) ); } 

正如你所看到的,我也通过某种方式来寻找属性。

Jon在写我的文本的时候滑了下来。 他的代码有一些东西会让我现在有些咆哮,如果我听起来像是在和Jon一起疯狂的话,我会提前道歉。 (我不是,我非常确定,Jon要向我学习的东西非常短)。但是我认为,对于任何使用XML的人来说,我将要做的事情非常重要。想一想。

我怀疑乔恩的解决scheme出现在我看到很多开发者所做的事情上:把XML文档看作是元素和属性的树。 我认为这很大程度上来自主要使用XML的开发人员,因为他们习惯使用的所有XML都是以这种方式构build的。 您可以发现这些开发人员,因为他们可以互换地使用术语“节点”和“元素”。 这导致他们想出解决scheme,将所有其他节点types视为特殊情况。 (我自己也是这些人中的一个很长时间了。)

这感觉就像是一个简化的假设。 但事实并非如此。 这使问题变得更难,代码更复杂。 它会导致您绕过XML技术(如XPath中的node()函数),这些专门devise用于统一处理所有节点types。

Jon的代码中有一个红色标记,即使我不知道需求是什么,也会让我在代码审查中查询它,这就是GetElementsByTagName 。 每当我看到使用这个方法的时候,想到的问题总是“为什么它必须是一个元素?” 答案经常是“哦,这个代码是否也需要处理文本节点?”

我知道,旧的post,但我最喜欢的版本(名称之一)是有缺陷的:当一个父节点具有不同名称的节点,它find第一个不匹配的节点名称后停止计数索引。

这是我的固定版本:

 /// <summary> /// Gets the X-Path to a given Node /// </summary> /// <param name="node">The Node to get the X-Path from</param> /// <returns>The X-Path of the Node</returns> public string GetXPathToNode(XmlNode node) { if (node.NodeType == XmlNodeType.Attribute) { // attributes have an OwnerElement, not a ParentNode; also they have // to be matched by name, not found by position return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name); } if (node.ParentNode == null) { // the only node with no parent is the root node, which has no path return ""; } // Get the Index int indexInParent = 1; XmlNode siblingNode = node.PreviousSibling; // Loop thru all Siblings while (siblingNode != null) { // Increase the Index if the Sibling has the same Name if (siblingNode.Name == node.Name) { indexInParent++; } siblingNode = siblingNode.PreviousSibling; } // the path to a node is the path to its parent, plus "/node()[n]", where n is its position among its siblings. return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, indexInParent); } 

我的10便士是罗伯特和科里的答案的混合体。 我只能声称额外的代码行的实际打字。

  private static string GetXPathToNode(XmlNode node) { if (node.NodeType == XmlNodeType.Attribute) { // attributes have an OwnerElement, not a ParentNode; also they have // to be matched by name, not found by position return String.Format( "{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name ); } if (node.ParentNode == null) { // the only node with no parent is the root node, which has no path return ""; } //get the index int iIndex = 1; XmlNode xnIndex = node; while (xnIndex.PreviousSibling != null) { iIndex++; xnIndex = xnIndex.PreviousSibling; } // the path to a node is the path to its parent, plus "/node()[n]", where // n is its position among its siblings. return String.Format( "{0}/node()[{1}]", GetXPathToNode(node.ParentNode), iIndex ); } 

这是我用过的一个简单的方法,为我工作。

  static string GetXpath(XmlNode node) { if (node.Name == "#document") return String.Empty; return GetXpath(node.SelectSingleNode("..")) + "/" + (node.NodeType == XmlNodeType.Attribute ? "@":String.Empty) + node.Name; } 

没有节点的“xpath”这样的东西。 对于任何给定的节点,可能会有许多xpathexpression式匹配它。

你或许可以在树上build立一个匹配它expression式,考虑到特定元素的索引等,但这不会是非常好的代码。

你为什么需要这个? 可能有更好的解决办法。

如果你这样做的话,你将得到一个带有节点名称和位置的path,如果你有这样的节点:“/ Service [1] / System [1] / Group [1] / Folder [2 ] /文件[2]”

 public string GetXPathToNode(XmlNode node) { if (node.NodeType == XmlNodeType.Attribute) { // attributes have an OwnerElement, not a ParentNode; also they have // to be matched by name, not found by position return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name); } if (node.ParentNode == null) { // the only node with no parent is the root node, which has no path return ""; } //get the index int iIndex = 1; XmlNode xnIndex = node; while (xnIndex.PreviousSibling != null && xnIndex.PreviousSibling.Name == xnIndex.Name) { iIndex++; xnIndex = xnIndex.PreviousSibling; } // the path to a node is the path to its parent, plus "/node()[n]", where // n is its position among its siblings. return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, iIndex); } 

我发现以上都没有使用XDocument ,所以我写了我自己的代码来支持XDocument并使用recursion。 我认为这个代码比其他一些代码更好地处理了多个相同的节点,因为它首先尝试深入到XMLpath,然后备份,只构build需要的东西。 所以,如果你有/home/white/bob/home/white/mike ,你想创build/home/white/bob/garage这个代码将会知道如何创build它。 但是,我不想混淆谓词或通配符,所以我明确地禁止了这些; 但是增加对它们的支持是很容易的。

 Private Sub NodeItterate(XDoc As XElement, XPath As String) 'get the deepest path Dim nodes As IEnumerable(Of XElement) nodes = XDoc.XPathSelectElements(XPath) 'if it doesn't exist, try the next shallow path If nodes.Count = 0 Then NodeItterate(XDoc, XPath.Substring(0, XPath.LastIndexOf("/"))) 'by this time all the required parent elements will have been constructed Dim ParentPath As String = XPath.Substring(0, XPath.LastIndexOf("/")) Dim ParentNode As XElement = XDoc.XPathSelectElement(ParentPath) Dim NewElementName As String = XPath.Substring(XPath.LastIndexOf("/") + 1, XPath.Length - XPath.LastIndexOf("/") - 1) ParentNode.Add(New XElement(NewElementName)) End If 'if we find there are more than 1 elements at the deepest path we have access to, we can't proceed If nodes.Count > 1 Then Throw New ArgumentOutOfRangeException("There are too many paths that match your expression.") End If 'if there is just one element, we can proceed If nodes.Count = 1 Then 'just proceed End If End Sub Public Sub CreateXPath(ByVal XDoc As XElement, ByVal XPath As String) If XPath.Contains("//") Or XPath.Contains("*") Or XPath.Contains(".") Then Throw New ArgumentException("Can't create a path based on searches, wildcards, or relative paths.") End If If Regex.IsMatch(XPath, "\[\]()@='<>\|") Then Throw New ArgumentException("Can't create a path based on predicates.") End If 'we will process this recursively. NodeItterate(XDoc, XPath) End Sub 

怎么样使用类扩展? ;)我的版本(build立在别人的工作)使用语法名称[索引] …与索引omited是元素没有“兄弟”。 获得元素索引的循环是在一个独立的例程(也是类扩展)之外的。

在任何实用程序类(或主程序类)中,

 static public int GetRank( this XmlNode node ) { // return 0 if unique, else return position 1...n in siblings with same name try { if( node is XmlElement ) { int rank = 1; bool alone = true, found = false; foreach( XmlNode n in node.ParentNode.ChildNodes ) if( n.Name == node.Name ) // sibling with same name { if( n.Equals(node) ) { if( ! alone ) return rank; // no need to continue found = true; } else { if( found ) return rank; // no need to continue alone = false; rank++; } } } } catch{} return 0; } static public string GetXPath( this XmlNode node ) { try { if( node is XmlAttribute ) return String.Format( "{0}/@{1}", (node as XmlAttribute).OwnerElement.GetXPath(), node.Name ); if( node is XmlText || node is XmlCDataSection ) return node.ParentNode.GetXPath(); if( node.ParentNode == null ) // the only node with no parent is the root node, which has no path return ""; int rank = node.GetRank(); if( rank == 0 ) return String.Format( "{0}/{1}", node.ParentNode.GetXPath(), node.Name ); else return String.Format( "{0}/{1}[{2}]", node.ParentNode.GetXPath(), node.Name, rank ); } catch{} return ""; } 

我制作了VBA for Excel来完成这个工作。 它输出Xpath的元组和元素或属性的相关文本。 目的是让业务分析人员识别和映射一些XML。 欣赏这是一个C#论坛,但认为这可能是有趣的。

 Sub Parse2(oSh As Long, inode As IXMLDOMNode, Optional iXstring As String = "", Optional indexes) Dim chnode As IXMLDOMNode Dim attr As IXMLDOMAttribute Dim oXString As String Dim chld As Long Dim idx As Variant Dim addindex As Boolean chld = 0 idx = 0 addindex = False 'determine the node type: Select Case inode.NodeType Case NODE_ELEMENT If inode.ParentNode.NodeType = NODE_DOCUMENT Then 'This gets the root node name but ignores all the namespace attributes oXString = iXstring & "//" & fp(inode.nodename) Else 'Need to deal with indexing. Where an element has siblings with the same nodeName,it needs to be indexed using [index], eg swapstreams or schedules For Each chnode In inode.ParentNode.ChildNodes If chnode.NodeType = NODE_ELEMENT And chnode.nodename = inode.nodename Then chld = chld + 1 Next chnode If chld > 1 Then '//inode has siblings of the same nodeName, so needs to be indexed 'Lookup the index from the indexes array idx = getIndex(inode.nodename, indexes) addindex = True Else End If 'build the XString oXString = iXstring & "/" & fp(inode.nodename) If addindex Then oXString = oXString & "[" & idx & "]" 'If type is element then check for attributes For Each attr In inode.Attributes 'If the element has attributes then extract the data pair XString + Element.Name, @Attribute.Name=Attribute.Value Call oSheet(oSh, oXString & "/@" & attr.Name, attr.Value) Next attr End If Case NODE_TEXT 'build the XString oXString = iXstring Call oSheet(oSh, oXString, inode.NodeValue) Case NODE_ATTRIBUTE 'Do nothing Case NODE_CDATA_SECTION 'Do nothing Case NODE_COMMENT 'Do nothing Case NODE_DOCUMENT 'Do nothing Case NODE_DOCUMENT_FRAGMENT 'Do nothing Case NODE_DOCUMENT_TYPE 'Do nothing Case NODE_ENTITY 'Do nothing Case NODE_ENTITY_REFERENCE 'Do nothing Case NODE_INVALID 'do nothing Case NODE_NOTATION 'do nothing Case NODE_PROCESSING_INSTRUCTION 'do nothing End Select 'Now call Parser2 on each of inode's children. If inode.HasChildNodes Then For Each chnode In inode.ChildNodes Call Parse2(oSh, chnode, oXString, indexes) Next chnode Set chnode = Nothing Else End If End Sub 

使用以下方法pipe理元素的计数:

 Function getIndex(tag As Variant, indexes) As Variant 'Function to get the latest index for an xml tag from the indexes array 'indexes array is passed from one parser function to the next up and down the tree Dim i As Integer Dim n As Integer If IsArrayEmpty(indexes) Then ReDim indexes(1, 0) indexes(0, 0) = "Tag" indexes(1, 0) = "Index" Else End If For i = 0 To UBound(indexes, 2) If indexes(0, i) = tag Then 'tag found, increment and return the index then exit 'also destroy all recorded tag names BELOW that level indexes(1, i) = indexes(1, i) + 1 getIndex = indexes(1, i) ReDim Preserve indexes(1, i) 'should keep all tags up to i but remove all below it Exit Function Else End If Next i 'tag not found so add the tag with index 1 at the end of the array n = UBound(indexes, 2) ReDim Preserve indexes(1, n + 1) indexes(0, n + 1) = tag indexes(1, n + 1) = 1 getIndex = 1 End Function 

这更容易

  ''' <summary> ''' Gets the full XPath of a single node. ''' </summary> ''' <param name="node"></param> ''' <returns></returns> ''' <remarks></remarks> Private Function GetXPath(ByVal node As Xml.XmlNode) As String Dim temp As String Dim sibling As Xml.XmlNode Dim previousSiblings As Integer = 1 'I dont want to know that it was a generic document If node.Name = "#document" Then Return "" 'Prime it sibling = node.PreviousSibling 'Perculate up getting the count of all of this node's sibling before it. While sibling IsNot Nothing 'Only count if the sibling has the same name as this node If sibling.Name = node.Name Then previousSiblings += 1 End If sibling = sibling.PreviousSibling End While 'Mark this node's index, if it has one ' Also mark the index to 1 or the default if it does have a sibling just no previous. temp = node.Name + IIf(previousSiblings > 0 OrElse node.NextSibling IsNot Nothing, "[" + previousSiblings.ToString() + "]", "").ToString() If node.ParentNode IsNot Nothing Then Return GetXPath(node.ParentNode) + "/" + temp End If Return temp End Function 

你的问题的另一个解决scheme可能是“标记”你以后想用自定义属性标识的xmlnode:

 var id = _currentNode.OwnerDocument.CreateAttribute("some_id"); id.Value = Guid.NewGuid().ToString(); _currentNode.Attributes.Append(id); 

您可以将其存储在“词典”中。 稍后您可以使用xpath查询来识别节点:

 newOrOldDocument.SelectSingleNode(string.Format("//*[contains(@some_id,'{0}')]", id)); 

我知道这不是对你的问题的直接回答,但是如果你想知道一个节点的xpath的理由是在你失去了在代码中的引用之后有一种'到达'节点的方法。

这也克服了文档获取元素添加/移动时的问题,这可能会混淆xpath(或其他答案中build议的索引)。

  public static string GetFullPath(this XmlNode node) { if (node.ParentNode == null) { return ""; } else { return $"{GetFullPath(node.ParentNode)}\\{node.ParentNode.Name}"; } }