I recently had the need to access a SOAP API to obtain some data. SOAP works by posting an xml file to a site url in a format defined by the API’s schema. The API then returns data, also in a form of an xml file. Based on this post, I figured suds was the easiest way to utilize Python to access the API so I could sequentially (and hence, parallelize) query data repeatedly. suds
did turn out to be relatively easy to use:
from suds.client import Client url = 'http://www.ripedev.com/webservices/localtime.asmx?WSDL' client = Client(url) print client client.service.LocalTimeByZipCode('90210')
This worked on my home network. At work, I had to utilize a proxy in order to access the outside world. Otherwise, I’d get a connection refuse message: urllib2.URLError: <urlopen error [Errno 111] Connection refused>
. The modification to use a proxy was straightforward:
from suds.client import Client proxy = {'http': 'proxy_username:proxy_password@proxy_server.com:port'} url = 'http://www.ripedev.com/webservices/localtime.asmx?WSDL' # client = Client(url) client = Client(url, proxy=proxy) print client client.service.LocalTimeByZipCode('90210')
The previous examples were from a public SOAP API I found online. Now, the site I wanted to actually hit uses ssl for encryption (i.e., https site) and requires authentication. I thought the fix would be as simple as:
from suds.client import Client proxy = {'https': 'proxy_username:proxy_password@proxy_server.com:port'} url = 'https://some_server.com/path/to/soap_api?wsdl' un = 'site_username' pw = 'site_password' # client = Client(url) client = Client(url, proxy=proxy, username=un, password=pw) print client client.service.someFunction(args)
However, I got the error message: Exception: (404, u'/path/to/soap_api')
. Very weird to me. Is it an authentication issue? Is it a proxy issue? If a proxy issue, how so, as my previous toy example worked. Tried the same site on my home network where there is no firewall, and things worked:
from suds.client import Client url = 'https://some_server.com/path/to/soap_api?wsdl' un = 'site_username' pw = 'site_password' # client = Client(url) client = Client(url, username=un, password=pw) print client client.service.someFunction(args)
Conclusion? Must be a proxy issue with https. I used the following prior to calling suds to help with debugging:
import logging logging.basicConfig(level=logging.INFO) logging.getLogger('suds.client').setLevel(logging.DEBUG) logging.getLogger('suds.transport').setLevel(logging.DEBUG) logging.getLogger('suds.xsd.schema').setLevel(logging.DEBUG) logging.getLogger('suds.wsdl').setLevel(logging.DEBUG)
My initial thoughts after some debugging: there must be something wrong with the proxy as the log shows python sending the request to the target url, but I get back a response that shows the path (minus the domain name) not found. What happened to the domain name? I notified the firewall team to look into this, as it appears the proxy is modifying something (url is not complete?). The firewall team investigated, and found that the proxy is returning a message that warns the ClientHello message is too large. This is one clue. The log also shows that the user was never authenticated and that the ssl handshake was never completed. My thought: still a proxy issue, as the python code works at home. However, the proxy team was able to access the https SOAP API through the proxy using the SOA Client plugin for Firefox. Now that convinced me that something else may be the culprit.
Googled for help, and thought this would be helpful.
import urllib2 import urllib import httplib import socket class ProxyHTTPConnection(httplib.HTTPConnection): _ports = {'http' : 80, 'https' : 443} def request(self, method, url, body=None, headers={}): #request is called before connect, so can interpret url and get #real host/port to be used to make CONNECT request to proxy proto, rest = urllib.splittype(url) if proto is None: raise ValueError, "unknown URL type: %s" % url #get host host, rest = urllib.splithost(rest) #try to get port host, port = urllib.splitport(host) #if port is not defined try to get from proto if port is None: try: port = self._ports[proto] except KeyError: raise ValueError, "unknown protocol for: %s" % url self._real_host = host self._real_port = port httplib.HTTPConnection.request(self, method, url, body, headers) def connect(self): httplib.HTTPConnection.connect(self) #send proxy CONNECT request self.send("CONNECT %s:%d HTTP/1.0\r\n\r\n" % (self._real_host, self._real_port)) #expect a HTTP/1.0 200 Connection established response = self.response_class(self.sock, strict=self.strict, method=self._method) (version, code, message) = response._read_status() #probably here we can handle auth requests... if code != 200: #proxy returned and error, abort connection, and raise exception self.close() raise socket.error, "Proxy connection failed: %d %s" % (code, message.strip()) #eat up header block from proxy.... while True: #should not use directly fp probablu line = response.fp.readline() if line == '\r\n': break class ProxyHTTPSConnection(ProxyHTTPConnection): default_port = 443 def __init__(self, host, port = None, key_file = None, cert_file = None, strict = None, timeout=0): # vinh added timeout ProxyHTTPConnection.__init__(self, host, port) self.key_file = key_file self.cert_file = cert_file def connect(self): ProxyHTTPConnection.connect(self) #make the sock ssl-aware ssl = socket.ssl(self.sock, self.key_file, self.cert_file) self.sock = httplib.FakeSocket(self.sock, ssl) class ConnectHTTPHandler(urllib2.HTTPHandler): def do_open(self, http_class, req): return urllib2.HTTPHandler.do_open(self, ProxyHTTPConnection, req) class ConnectHTTPSHandler(urllib2.HTTPSHandler): def do_open(self, http_class, req): return urllib2.HTTPSHandler.do_open(self, ProxyHTTPSConnection, req) from suds.client import Client # from httpsproxy import ConnectHTTPSHandler, ConnectHTTPHandler ## these are code from above classes import urllib2, urllib from suds.transport.http import HttpTransport opener = urllib2.build_opener(ConnectHTTPHandler, ConnectHTTPSHandler) urllib2.install_opener(opener) t = HttpTransport() t.urlopener = opener url = 'https://some_server.com/path/to/soap_api?wsdl' proxy = {'https': 'proxy_username:proxy_password@proxy_server.com:port'} un = 'site_username' pw = 'site_password' client = Client(url=url, transport=t, proxy=proxy, username=un, password=pw) client = Client(url=url, transport=t, proxy=proxy, username=un, password=pw, location='https://some_server.com/path/to/soap_api?wsdl') ## some site suggests specifying location
This too did not work. Continued to google, and found that lot’s of people are having issues with https and proxy. I knew suds depended on urllib2
, so googled about that as well, and people too had issues with urllib2
in terms of https and proxy. I then decided to investigate using urllib2
to contact the https url through a proxy:
## http://stackoverflow.com/questions/5227333/xml-soap-post-error-what-am-i-doing-wrong ## http://stackoverflow.com/questions/34079/how-to-specify-an-authenticated-proxy-for-a-python-http-connect ### at home this works import urllib2 url = 'https://some_server.com/path/to/soap_api?wsdl' password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password(None, uri=url, user='site_username', passwd='site_password') auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr) opener = urllib2.build_opener(auth_handler) urllib2.install_opener(opener) page = urllib2.urlopen(url) page.read() ### work network, does not work: url = 'https://some_server.com/path/to/soap_api?wsdl' proxy = urllib2.ProxyHandler({'https':'proxy_username:proxy_password@proxy_server.com:port', 'http':'proxy_username:proxy_password@proxy_server.com:port'}) password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password(None, uri=url, user='site_username', passwd='site_password') auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr) opener = urllib2.build_opener(proxy, auth_handler, urllib2.HTTPSHandler) urllib2.install_opener(opener) page = urllib2.urlopen(site) ### also tried re-doing above, but with the custom handler as defined in the previous code chunk (http://code.activestate.com/recipes/456195/) running first (run the list of classes)
No luck. I re-read this post that I ran into before, and really agreed that urllib2
is severely flawed, especially when using https proxy. At the end of the page, the author suggested using the requests package. Tried it out, and I was able to connect using the https proxy:
import requests import xmltodict p1 = 'http://proxy_username:proxy_password@proxy_server.com:port' p2 = 'https://proxy_username:proxy_password@proxy_server.com:port' proxy = {'http': p1, 'https':p2} site = 'https://some_server.com/path/to/soap_api?wsdl' r = requests.get(site, proxies=proxy, auth=('site_username', 'site_password')) r.text ## works soap_xml_in = """<?xml version="1.0" encoding="UTF-8"?> ... """ headers = {'SOAPAction': u'""', 'Content-Type': 'text/xml; charset=utf-8', 'Content-type': 'text/xml; charset=utf-8', 'Soapaction': u'""'} soap_xml_out = requests.post(site, data=soap_xml_in, headers=headers, proxies=proxy, auth=('site_username', 'site_password')).text
My learnings?
suds
is great for accessing SOAP, just not when you have to access an https site through a firewall.urllib2
is severely flawed. Things only work in very standard situations.requests
package is very powerful and just works. Even though I have to deal with actual xml files as opposed to leveragingsuds
‘ pythonic structures, thexmltodict
package helps to translate the xml file into dictionaries that only adds marginal effort to extract out relevant data.
NOTE: I had to install libuuid-devel
in cygwin64 because I was getting an installation error.