diff --git a/README.md b/README.md index 3a1a2f6..371459b 100644 --- a/README.md +++ b/README.md @@ -19,53 +19,18 @@ or use these lists for other applications like selektor. So we make two files that are structured in YAML: ``` /etc/tor/yaml/torrc-goodnodes.yaml - ---- GoodNodes: - EntryNodes: [] Relays: - # ExitNodes will be overwritten by this program - ExitNodes: [] - IntroductionPoints: [] - # use the Onions section to list onion services you want the - # Introduction Points whitelisted - these points may change daily - # Look in tor's notice.log for 'Every introduction point for service' - Onions: [] - # use the Services list to list elays you want the whitelisted - # Look in tor's notice.log for 'Wanted to contact directory mirror' - Services: [] - - + IntroductionPoints: + - NODEFINGERPRINT + ... By default all sections of the goodnodes.yaml are used as a whitelist. -Use the GoodNodes/Onions list to list onion services you want the -Introduction Points whitelisted - these points may change daily -Look in tor's notice.log for warnings of 'Every introduction point for service' - -```--hs_dir``` ```default='/var/lib/tor'``` will make the program -parse the files named ```hostname``` below this dir to find -Hidden Services to whitelist. - -The Introduction Points can change during the day, so you may want to -rerun this program to freshen the list of Introduction Points. A full run -that processes all the relays from stem can take 30 minutes, or run with: - -```--saved_only``` will run the program with just cached information -on the relats, but will update the Introduction Points from the Services. - /etc/tor/yaml/torrc-badnodes.yaml - BadNodes: - # list the internet domains you know are bad so you don't - # waste time trying to download contacts from them. - ExcludeDomains: [] - ExcludeNodes: - # BadExit will be overwritten by this program - BadExit: [] - # list MyBadExit in --bad_sections if you want it used, to exclude nodes - # or any others as a list separated by comma(,) - MyBadExit: [] - + ExcludeExitNodes: + BadExit: + # $0000000000000000000000000000000000000007 ``` That part requires [PyYAML](https://pyyaml.org/wiki/PyYAML) https://github.com/yaml/pyyaml/ or ```ruamel```: do @@ -74,7 +39,7 @@ the advantage of the former is that it preserves comments. (You may have to run this as the Tor user to get RW access to /run/tor/control, in which case the directory for the YAML files must -be group Tor writeable, and its parent's directories group Tor RX.) +be group Tor writeable, and its parents group Tor RX.) Because you don't want to exclude the introduction points to any onion you want to connect to, ```--white_onions``` should whitelist the @@ -82,13 +47,6 @@ introduction points to a comma sep list of onions; we fixed stem to do this: * https://github.com/torproject/stem/issues/96 * https://gitlab.torproject.org/legacy/trac/-/issues/25417 -Use the GoodNodes/Onions list in goodnodes.yaml to list onion services -you want the Introduction Points whitelisted - these points may change daily. -Look in tor's notice.log for 'Every introduction point for service' - -```notice_log``` will parse the notice log for warnings about relays and -services that will then be whitelisted. - ```--torrc_output``` will write the torrc ExcludeNodes configuration to a file. ```--good_contacts``` will write the contact info as a ciiss dictionary @@ -113,7 +71,7 @@ list of fingerprints to ```ExitNodes```, a whitelist of relays to use as exits. 3. clean relays that don't have "good' contactinfo. (implies 1) ```=Empty,NoEmail,NotGood``` -The default is ```Empty,NoEmail,NotGood``` ; ```NoEmail``` is inherently imperfect +The default is ```=Empty,NotGood``` ; ```NoEmail``` is inherently imperfect in that many of the contact-as-an-email are obfuscated, but we try anyway. To be "good" the ContactInfo must: @@ -122,20 +80,81 @@ To be "good" the ContactInfo must: 3. must support getting the file with a valid SSL cert from a recognized authority 4. (not in the spec but added by Python) must use a TLS SSL > v1 5. must have a fingerprint list in the file -6. must have the FP that got us the contactinfo in the fingerprint list in the file. - -```--wait_boot``` is the number of seconds to wait for Tor to booststrap - -```--wellknown_output``` will make the program write the well-known files -(```/.well-known/tor-relay/rsa-fingerprint.txt```) to a directory. - -```--torrc_output``` will write a file of the commands that it sends to -the Tor controller, so you can include it in a ```/etc/toc/torrc```. - -```--relays_output write the download relays in json to a file. The relays -are downloaded from https://onionoo.torproject.org/details +6. must have the FP that got us the contactinfo in the fingerprint list in the file, For usage, do ```python3 exclude_badExits.py --help` -See [exclude_badExits.txt](./exclude_badExits.txt) + +## Usage +``` + +usage: exclude_badExits.py [-h] [--https_cafile HTTPS_CAFILE] + [--proxy_host PROXY_HOST] [--proxy_port PROXY_PORT] + [--proxy_ctl PROXY_CTL] [--torrc TORRC] + [--timeout TIMEOUT] [--good_nodes GOOD_NODES] + [--bad_nodes BAD_NODES] [--bad_on BAD_ON] + [--bad_contacts BAD_CONTACTS] + [--strict_nodes {0,1}] [--wait_boot WAIT_BOOT] + [--points_timeout POINTS_TIMEOUT] + [--log_level LOG_LEVEL] + [--bad_sections BAD_SECTIONS] + [--white_onions WHITE_ONIONS] + [--torrc_output TORRC_OUTPUT] + [--relays_output RELAYS_OUTPUT] + [--good_contacts GOOD_CONTACTS] + +optional arguments: + -h, --help show this help message and exit + --https_cafile HTTPS_CAFILE + Certificate Authority file (in PEM) + --proxy_host PROXY_HOST, --proxy-host PROXY_HOST + proxy host + --proxy_port PROXY_PORT, --proxy-port PROXY_PORT + proxy control port + --proxy_ctl PROXY_CTL, --proxy-ctl PROXY_CTL + control socket - or port + --torrc TORRC torrc to check for suggestions + --timeout TIMEOUT proxy download connect timeout + --good_nodes GOOD_NODES + Yaml file of good info that should not be excluded + --bad_nodes BAD_NODES + Yaml file of bad nodes that should also be excluded + --bad_on BAD_ON comma sep list of conditions - Empty,NoEmail,NotGood + --bad_contacts BAD_CONTACTS + Yaml file of bad contacts that bad FPs are using + --strict_nodes {0,1} Set StrictNodes: 1 is less anonymous but more secure, + although some sites may be unreachable + --wait_boot WAIT_BOOT + Seconds to wait for Tor to booststrap + --points_timeout POINTS_TIMEOUT + Timeout for getting introduction points - must be long + >120sec. 0 means disabled looking for IPs + --log_level LOG_LEVEL + 10=debug 20=info 30=warn 40=error + --bad_sections BAD_SECTIONS + sections of the badnodes.yaml to use, comma separated, + '' BROKEN + --white_onions WHITE_ONIONS + comma sep. list of onions to whitelist their + introduction points - BROKEN + --torrc_output TORRC_OUTPUT + Write the torrc configuration to a file + --relays_output RELAYS_OUTPUT + Write the download relays in json to a file + --good_contacts GOOD_CONTACTS + Write the proof data of the included nodes to a YAML + file + +This extends nusenu's basic idea of using the stem library to dynamically +exclude nodes that are likely to be bad by putting them on the ExcludeNodes or +ExcludeExitNodes setting of a running Tor. * +https://github.com/nusenu/noContactInfo_Exit_Excluder * +https://github.com/TheSmashy/TorExitRelayExclude The basic idea is to exclude +Exit nodes that do not have ContactInfo: * +https://github.com/nusenu/ContactInfo-Information-Sharing-Specification That +can be extended to relays that do not have an email in the contact, or to +relays that do not have ContactInfo that is verified to include them. + +``` + diff --git a/exclude_badExits.bash b/exclude_badExits.bash index e08ba05..fd8eafb 100644 --- a/exclude_badExits.bash +++ b/exclude_badExits.bash @@ -3,36 +3,23 @@ PROG=exclude_badExits.py SOCKS_PORT=9050 -SOCKS_HOST=127.0.0.1 CAFILE=/etc/ssl/certs/ca-certificates.crt -# you may have a special python for installed packages -EXE=`which python3.bash` -$EXE exclude_badExits.py --help > exclude_badExits.txt & -$EXE -c 'from exclude_badExits import __doc__; print(__doc__)' >exclude_badExits.md # an example of running exclude_badExits with full debugging -# expected to 20 minutes or so +# expected to take an hour or so declare -a LARGS LARGS=( - # --saved_only - # --strict_nodes 1 - --points_timeout 150 --log_level 10 + ) +# you may have a special python for installed packages +EXE=`which python3.bash` +LARGS+=( + --strict_nodes 1 + --points_timeout 120 + --proxy-host 127.0.0.1 + --proxy-port $SOCKS_PORT --https_cafile $CAFILE ) -[ -z "$socks_proxy" ] || \ -LARGS+=( - --proxy-host $SOCKS_HOST - --proxy-port $SOCKS_PORT -) - -if [ -f /var/lib/tor/.SelekTOR/3xx/cache/9050/notice.log ] ; then - LARGS+=(--notice_log /var/lib/tor/.SelekTOR/3xx/cache/9050/notice.log) -fi - -if [ -d /var/lib/tor/hs ] ; then - LARGS+=( --hs_dir /var/lib/tor/hs ) -fi if [ -f '/run/tor/control' ] ; then LARGS+=(--proxy-ctl '/run/tor/control' ) @@ -47,9 +34,8 @@ LARGS+=( --white_onions $ddg ) # you may need to be the tor user to read /run/tor/control grep -q ^debian-tor /etc/group && TORU=debian-tor || { grep -q ^tor /etc/group && TORU=tor - } -# --saved_only -sudo -u $TORU $EXE exclude_badExits.py "${LARGS[@]}" "$@" \ +} +sudo -u $TORU $EXE exclude_badExits.py "${LARGS[@]}" \ 2>&1|tee exclude_badExits6.log # The DEBUG statements contain the detail of why the relay was considered bad. diff --git a/exclude_badExits.py b/exclude_badExits.py index 19d100e..ad72d6d 100644 --- a/exclude_badExits.py +++ b/exclude_badExits.py @@ -17,37 +17,7 @@ or to relays that do not have ContactInfo that is verified to include them. """ __prolog__ = __doc__ -sGOOD_NODES = """ ---- -GoodNodes: - EntryNodes: [] - Relays: - # ExitNodes will be overwritten by this program - ExitNodes: [] - IntroductionPoints: [] - # use the Onions section to list onion services you want the - # Introduction Points whitelisted - these points may change daily - # Look in tor's notice.log for 'Every introduction point for service' - Onions: [] - # use the Services list to list elays you want the whitelisted - # Look in tor's notice.log for 'Wanted to contact directory mirror' - Services: [] -""" - -sBAD_NODES = """ -BadNodes: - # list the internet domains you know are bad so you don't - # waste time trying to download contacts from them. - ExcludeDomains: [] - ExcludeNodes: - # BadExit will be overwritten by this program - BadExit: [] - # list MyBadExit in --bad_sections if you want it used, to exclude nodes - # or any others as a list separated by comma(,) - MyBadExit: [] -""" - -__doc__ +=f"""But there's a problem, and your Tor notice.log will tell you about it: +__doc__ +="""But there's a problem, and your Tor notice.log will tell you about it: you could exclude the relays needed to access hidden services or mirror directories. So we need to add to the process the concept of a whitelist. In addition, we may have our own blacklist of nodes we want to exclude, @@ -56,27 +26,18 @@ or use these lists for other applications like selektor. So we make two files that are structured in YAML: ``` /etc/tor/yaml/torrc-goodnodes.yaml -{sGOOD_NODES} - +GoodNodes: + Relays: + IntroductionPoints: + - NODEFINGERPRINT + ... By default all sections of the goodnodes.yaml are used as a whitelist. -Use the GoodNodes/Onions list to list onion services you want the -Introduction Points whitelisted - these points may change daily -Look in tor's notice.log for warnings of 'Every introduction point for service' - -```--hs_dir``` ```default='/var/lib/tor'``` will make the program -parse the files named ```hostname``` below this dir to find -Hidden Services to whitelist. - -The Introduction Points can change during the day, so you may want to -rerun this program to freshen the list of Introduction Points. A full run -that processes all the relays from stem can take 30 minutes, or run with: - -```--saved_only``` will run the program with just cached information -on the relats, but will update the Introduction Points from the Services. - /etc/tor/yaml/torrc-badnodes.yaml -{sBAD_NODES} +BadNodes: + ExcludeExitNodes: + BadExit: + # $0000000000000000000000000000000000000007 ``` That part requires [PyYAML](https://pyyaml.org/wiki/PyYAML) https://github.com/yaml/pyyaml/ or ```ruamel```: do @@ -85,7 +46,7 @@ the advantage of the former is that it preserves comments. (You may have to run this as the Tor user to get RW access to /run/tor/control, in which case the directory for the YAML files must -be group Tor writeable, and its parent's directories group Tor RX.) +be group Tor writeable, and its parents group Tor RX.) Because you don't want to exclude the introduction points to any onion you want to connect to, ```--white_onions``` should whitelist the @@ -93,13 +54,6 @@ introduction points to a comma sep list of onions; we fixed stem to do this: * https://github.com/torproject/stem/issues/96 * https://gitlab.torproject.org/legacy/trac/-/issues/25417 -Use the GoodNodes/Onions list in goodnodes.yaml to list onion services -you want the Introduction Points whitelisted - these points may change daily. -Look in tor's notice.log for 'Every introduction point for service' - -```notice_log``` will parse the notice log for warnings about relays and -services that will then be whitelisted. - ```--torrc_output``` will write the torrc ExcludeNodes configuration to a file. ```--good_contacts``` will write the contact info as a ciiss dictionary @@ -124,7 +78,7 @@ list of fingerprints to ```ExitNodes```, a whitelist of relays to use as exits. 3. clean relays that don't have "good' contactinfo. (implies 1) ```=Empty,NoEmail,NotGood``` -The default is ```Empty,NoEmail,NotGood``` ; ```NoEmail``` is inherently imperfect +The default is ```=Empty,NotGood``` ; ```NoEmail``` is inherently imperfect in that many of the contact-as-an-email are obfuscated, but we try anyway. To be "good" the ContactInfo must: @@ -133,21 +87,9 @@ To be "good" the ContactInfo must: 3. must support getting the file with a valid SSL cert from a recognized authority 4. (not in the spec but added by Python) must use a TLS SSL > v1 5. must have a fingerprint list in the file -6. must have the FP that got us the contactinfo in the fingerprint list in the file. - -```--wait_boot``` is the number of seconds to wait for Tor to booststrap - -```--wellknown_output``` will make the program write the well-known files -(```/.well-known/tor-relay/rsa-fingerprint.txt```) to a directory. - -```--torrc_output``` will write a file of the commands that it sends to -the Tor controller, so you can include it in a ```/etc/toc/torrc```. - -```--relays_output write the download relays in json to a file. The relays -are downloaded from https://onionoo.torproject.org/details +6. must have the FP that got us the contactinfo in the fingerprint list in the file, For usage, do ```python3 exclude_badExits.py --help` -See [exclude_badExits.txt](./exclude_badExits.txt) """ @@ -157,9 +99,7 @@ See [exclude_badExits.txt](./exclude_badExits.txt) import argparse import os import json -import re import sys -import tempfile import time from io import StringIO @@ -217,68 +157,76 @@ try: except ImportError: oPARSER = None -oCONTACT_RE = re.compile(r'([^:]*)(\s+)(email|url|proof|ciissversion|abuse|gpg):') - ETC_DIR = '/usr/local/etc/tor/yaml' -aGOOD_CONTACTS_DB = {} -aGOOD_CONTACTS_FPS = {} -aBAD_CONTACTS_DB = {} +aTRUST_DB = {} +aTRUST_DB_INDEX = {} aRELAYS_DB = {} aRELAYS_DB_INDEX = {} aFP_EMAIL = {} aDOMAIN_FPS = {} sDETAILS_URL = "https://metrics.torproject.org/rs.html#details/" # You can call this while bootstrapping -sEXCLUDE_EXIT_GROUP = 'ExcludeNodes' +sEXCLUDE_EXIT_KEY = 'ExcludeNodes' sINCLUDE_EXIT_KEY = 'ExitNodes' oBAD_ROOT = 'BadNodes' -aBAD_NODES = safe_load(sBAD_NODES) +oBAD_NODES = safe_load(""" +BadNodes: + ExcludeDomains: [] + ExcludeNodes: + BadExit: [] +""") sGOOD_ROOT = 'GoodNodes' sINCLUDE_GUARD_KEY = 'EntryNodes' sEXCLUDE_DOMAINS = 'ExcludeDomains' -aGOOD_NODES = safe_load(sGOOD_NODES) +oGOOD_NODES = safe_load(""" +GoodNodes: + EntryNodes: [] + Relays: + ExitNodes: [] + IntroductionPoints: [] + Onions: [] + Services: [] +""") lKNOWN_NODNS = [] tMAYBE_NODNS = set() def lYamlBadNodes(sFile, - section=sEXCLUDE_EXIT_GROUP, - tWanted=None): - global aBAD_NODES + section=sEXCLUDE_EXIT_KEY, + lWanted=['BadExit']): + global oBAD_NODES global lKNOWN_NODNS global tMAYBE_NODNS - l = [] - if tWanted is None: tWanted = {'BadExit'} if not yaml: - return l + return [] if os.path.exists(sFile): with open(sFile, 'rt') as oFd: - aBAD_NODES = safe_load(oFd) + oBAD_NODES = safe_load(oFd) - root = sEXCLUDE_EXIT_GROUP + # BROKEN +# root = sEXCLUDE_EXIT_KEY # for elt in o[oBAD_ROOT][root][section].keys(): -# if tWanted and elt not in tWanted: continue +# if lWanted and elt not in lWanted: continue # # l += o[oBAD_ROOT][root][section][elt] - for sub in tWanted: - l += aBAD_NODES[oBAD_ROOT][sEXCLUDE_EXIT_GROUP][sub] + l = oBAD_NODES[oBAD_ROOT][sEXCLUDE_EXIT_KEY]['BadExit'] tMAYBE_NODNS = set(safe_load(StringIO(yKNOWN_NODNS))) root = sEXCLUDE_DOMAINS - if sEXCLUDE_DOMAINS in aBAD_NODES[oBAD_ROOT] and aBAD_NODES[oBAD_ROOT][sEXCLUDE_DOMAINS]: - tMAYBE_NODNS.update(set(aBAD_NODES[oBAD_ROOT][sEXCLUDE_DOMAINS])) + if root in oBAD_NODES[oBAD_ROOT] and oBAD_NODES[oBAD_ROOT][root]: + tMAYBE_NODNS.extend(oBAD_NODES[oBAD_ROOT][root]) return l def lYamlGoodNodes(sFile='/etc/tor/torrc-goodnodes.yaml'): - global aGOOD_NODES + global oGOOD_NODES l = [] if not yaml: return l if os.path.exists(sFile): with open(sFile, 'rt') as oFd: o = safe_load(oFd) - aGOOD_NODES = o + oGOOD_NODES = o if 'EntryNodes' in o[sGOOD_ROOT].keys(): l = o[sGOOD_ROOT]['EntryNodes'] # yq '.Nodes.IntroductionPoints|.[]' < /etc/tor/torrc-goodnodes.yaml @@ -304,19 +252,9 @@ def bdomain_is_bad(domain, fp): tBAD_URLS = set() lAT_REPS = ['[]', ' at ', '(at)', '[at]', '', '(att)', '_at_', '~at~', '.at.', '!at!', 't', '<(a)>', '|__at-|', '<:at:>', - '[__at ]', '"a t"', 'removeme at ', ' a7 ', '{at-}' - '[at}', 'atsign', '-at-', '(at_sign)', 'a.t', - 'atsignhere', ' _a_ ', ' (at-sign) ', "'at sign'", - '(a)', ' atsign ', '(at symbol)', ' anat ', '=at=', - '-at-', '-dot-', ' [a] ','(at)', '', '[at sign]', - '"at"', '{at}', '-----symbol for email----', '[at@]', - '(at sign here)', '==at', '|=dot|','/\t', - ] + '[__at ]', '"a t"', 'removeme at '] lDOT_REPS = [' point ', ' dot ', '[dot]', '(dot)', '_dot_', '!dot!', '<.>', - '<:dot:>', '|dot--|', ' d07 ', '', '(dot]', '{dot)', - 'd.t', "'dot'", '(d)', '-dot-', ' adot ', - '(d)', ' . ', '[punto]', '(point)', '"dot"', '{.}', - '--separator--', '|=dot|', ' period ', ')dot(', + '<:dot:>', '|dot--|', ] lNO_EMAIL = [ '', @@ -341,26 +279,18 @@ lNO_EMAIL = [ 'your@email.com', r'', ] -# -lMORONS = ['hoster:Quintex Alliance Consulting '] - def sCleanEmail(s): s = s.lower() for elt in lAT_REPS: - if not elt.startswith(' '): - s = s.replace(' ' + elt + ' ', '@') - s = s.replace(elt, '@') + s = s.replace(' ' + elt + ' ', '@').replace(elt, '@') for elt in lDOT_REPS: - if not elt.startswith(' '): - s = s.replace(' ' + elt + ' ', '.') s = s.replace(elt, '.') s = s.replace('(dash)', '-') - s = s.replace('hyphen ', '-') for elt in lNO_EMAIL: - s = s.replace(elt, '?') + s = s.replace(elt, '') return s -lEMAILS = ['abuse', 'email'] +lATS = ['abuse', 'email'] lINTS = ['ciissversion', 'uplinkbw', 'signingkeylifetime', 'memory'] lBOOLS = ['dnssec', 'dnsqname', 'aesni', 'autoupdate', 'dnslocalrootzone', 'sandbox', 'offlinemasterkey'] @@ -375,7 +305,7 @@ def aCleanContact(a): a[elt] = True else: a[elt] = False - for elt in lEMAILS: + for elt in lATS: if elt not in a: continue a[elt] = sCleanEmail(a[elt]) if 'url' in a.keys(): @@ -394,8 +324,8 @@ def bVerifyContact(a=None, fp=None, https_cafile=None): global aFP_EMAIL global tBAD_URLS global lKNOWN_NODNS - global aGOOD_CONTACTS_DB - global aGOOD_CONTACTS_FPS + global aTRUST_DB + global aTRUST_DB_INDEX assert a assert fp assert https_cafile @@ -416,10 +346,10 @@ def bVerifyContact(a=None, fp=None, https_cafile=None): LOG.warn(f"{fp} 'proof' not in {keys}") return a - if aGOOD_CONTACTS_FPS and fp in aGOOD_CONTACTS_FPS.keys(): - aCachedContact = aGOOD_CONTACTS_FPS[fp] + if aTRUST_DB_INDEX and fp in aTRUST_DB_INDEX.keys(): + aCachedContact = aTRUST_DB_INDEX[fp] if aCachedContact['email'] == a['email']: - LOG.info(f"{fp} in aGOOD_CONTACTS_FPS") + LOG.info(f"{fp} in aTRUST_DB_INDEX") return aCachedContact if 'url' not in keys: @@ -447,16 +377,53 @@ def bVerifyContact(a=None, fp=None, https_cafile=None): lKNOWN_NODNS.append(domain) return a + if a['proof'] in ['dns-rsa']: + # only support uri for now + if False and ub_ctx: + fp_domain = fp + '.' + domain + if idns_validate(fp_domain, + libunbound_resolv_file='resolv.conf', + dnssec_DS_file='dnssec-root-trust', + ) == 0: + pass + LOG.warn(f"{fp} proof={a['proof']} - assumed good") + a['fps'] = [fp] + aTRUST_DB_INDEX[fp] = a + return a return True -def oVerifyUrl(url, domain, fp=None, https_cafile=None, timeout=20, host='127.0.0.1', port=9050, oargs=None): +# async +# If we keep a cache of FPs that we have gotten by downloading a URL +# we can avoid re-downloading the URL of other FP in the list of relays. +# If we paralelize the gathering of the URLs, we may have simultaneous +# gathers of the same URL from different relays, defeating the advantage +# of going parallel. The cache is global aDOMAIN_FPS. +def aVerifyContact(a=None, fp=None, https_cafile=None, timeout=20, host='127.0.0.1', port=9050, oargs=None): + global aFP_EMAIL + global tBAD_URLS + global lKNOWN_NODNS + global aDOMAIN_FPS + + assert a + assert fp + assert https_cafile + + r = bVerifyContact(a=a, fp=fp, https_cafile=https_cafile) + if r is not True: + return r + + domain = a['url'].replace('https://', '').replace('http://', '').rstrip('/') + if domain in aDOMAIN_FPS.keys(): + a['fps'] = aDOMAIN_FPS[domain] + return a + +# LOG.debug(f"{len(keys)} contact fields for {fp}") + url = a['url'] + "/.well-known/tor-relay/rsa-fingerprint.txt" + if url in aDOMAIN_FPS.keys(): + a['fps'] = aDOMAIN_FPS[url] + return a if bAreWeConnected() is False: raise SystemExit("we are not connected") - if url in tBAD_URLS: - LOG.debug(f"BC Known bad url from {domain} for {fp}") - return None - - o = None try: if httpx: LOG.debug(f"Downloading from {domain} for {fp}") @@ -471,99 +438,35 @@ def oVerifyUrl(url, domain, fp=None, https_cafile=None, timeout=20, host='127.0. content_type='text/plain') # requests response: text "reason", "status_code" except AttributeError as e: - LOG.exception(f"BC AttributeError downloading from {domain} {e}") - tBAD_URLS.add(url) + LOG.exception(f"AttributeError downloading from {domain} {e}") except CertificateError as e: - LOG.warn(f"BC CertificateError downloading from {domain} {e}") - tBAD_URLS.add(url) + LOG.warn(f"CertificateError downloading from {domain} {e}") + tBAD_URLS.add(a['url']) except TrustorError as e: if e.args == "HTTP Errorcode 404": aFP_EMAIL[fp] = a['email'] - LOG.warn(f"BC TrustorError 404 from {domain} {e.args}") + LOG.warn(f"TrustorError 404 from {domain} {e.args}") else: - LOG.warn(f"BC TrustorError downloading from {domain} {e.args}") - tBAD_URLS.add(url) + LOG.warn(f"TrustorError downloading from {domain} {e.args}") + tBAD_URLS.add(a['url']) except (urllib3.exceptions.MaxRetryError, urllib3.exceptions.ProtocolError,) as e: # noqa # # maybe offline - not bad - LOG.warn(f"BC MaxRetryError downloading from {domain} {e}") + LOG.warn(f"MaxRetryError downloading from {domain} {e}") except (BaseException) as e: - LOG.error(f"BC Exception {type(e)} downloading from {domain} {e}") + LOG.error(f"Exception {type(e)} downloading from {domain} {e}") else: - return o - return None - -# async -# If we keep a cache of FPs that we have gotten by downloading a URL -# we can avoid re-downloading the URL of other FP in the list of relays. -# If we paralelize the gathering of the URLs, we may have simultaneous -# gathers of the same URL from different relays, defeating the advantage -# of going parallel. The cache is global aDOMAIN_FPS. -def aVerifyContact(a=None, fp=None, https_cafile=None, timeout=20, host='127.0.0.1', port=9050, oargs=None): - global aFP_EMAIL - global tBAD_URLS - global lKNOWN_NODNS - global aDOMAIN_FPS - global aBAD_CONTACTS_DB - - assert a - assert fp - assert https_cafile - - domain = a['url'].replace('https://', '').replace('http://', '').rstrip('/') - a['url'] = 'https://' + domain - if domain in aDOMAIN_FPS.keys(): - a['fps'] = aDOMAIN_FPS[domain] - return a - - r = bVerifyContact(a=a, fp=fp, https_cafile=https_cafile) - if r is not True: - return r - if a['url'] in tBAD_URLS: - a['fps'] = [] - return a - - if a['proof'] == 'dns-rsa': - if ub_ctx: - fp_domain = fp + '.' + domain - if idns_validate(fp_domain, - libunbound_resolv_file='resolv.conf', - dnssec_DS_file='dnssec-root-trust', - ) == 0: - LOG.warn(f"{fp} proof={a['proof']} - validated good") - a['fps'] = [fp] - aGOOD_CONTACTS_FPS[fp] = a - else: - a['fps'] = [] - return a - # only test url for now drop through - url = a['url'] - else: - url = a['url'] + "/.well-known/tor-relay/rsa-fingerprint.txt" - o = oVerifyUrl(url, domain, fp=fp, https_cafile=https_cafile, timeout=timeout, host=host, port=port, oargs=oargs) - if not o: - LOG.warn(f"BC Failed Download from {url} ") - a['fps'] = [] - tBAD_URLS.add(url) - aBAD_CONTACTS_DB[fp] = a - elif a['proof'] == 'dns-rsa': - # well let the test of the URL be enough for now - LOG.debug(f"Downloaded from {url} ") - a['fps'] = [fp] - aDOMAIN_FPS[domain] = a['fps'] - elif a['proof'] == 'uri-rsa': a = aContactFps(oargs, a, o, domain) - if a['fps']: - LOG.debug(f"Downloaded from {url} {len(a['fps'])} FPs for {fp}") - else: - aBAD_CONTACTS_DB[fp] = a - LOG.debug(f"BC Downloaded from {url} NO FPs for {fp}") + LOG.debug(f"Downloaded from {domain} {len(a['fps'])} FPs for {fp}") aDOMAIN_FPS[domain] = a['fps'] + url = a['url'] + aDOMAIN_FPS[url] = a['fps'] return a def aContactFps(oargs, a, o, domain): global aFP_EMAIL global tBAD_URLS + global lKNOWN_NODNS global aDOMAIN_FPS if hasattr(o, 'status'): @@ -593,7 +496,7 @@ def aContactFps(oargs, a, o, domain): with open(sfile, 'wt') as oFd: oFd.write(data) except Exception as e: - LOG.warn(f"Error writing {sfile} {e}") + LOG.warn(f"Error wirting {sfile} {e}") a['modified'] = int(time.time()) if not l: @@ -603,6 +506,7 @@ def aContactFps(oargs, a, o, domain): and len(elt) == 40 \ and not elt.startswith('#')] LOG.info(f"Downloaded from {domain} {len(a['fps'])} FPs") + aDOMAIN_FPS[domain] = a['fps'] return a def aParseContact(contact, fp): @@ -612,33 +516,23 @@ def aParseContact(contact, fp): """ a = {} if not contact: - LOG.warn(f"BC null contact for {fp}") + LOG.warn(f"null contact for {fp}") LOG.debug(f"{fp} {contact}") return {} - - contact = contact.split(r'\n')[0] - for elt in lMORONS: - contact = contact.replace(elt) - m = oCONTACT_RE.match(contact) - # 450 matches! - if m and m.groups and len(m.groups(0)) > 2 and m.span()[1] > 0: - i = len(m.groups(0)[0]) + len(m.groups(0)[1]) - contact = contact[i:] - # shlex? lelts = contact.split(' ') if not lelts: - LOG.warn(f"BC empty contact for {fp}") + LOG.warn(f"empty contact for {fp}") LOG.debug(f"{fp} {contact}") return {} - for elt in lelts: if ':' not in elt: + if elt == 'DFRI': + # oddball + continue # hoster:Quintex Alliance Consulting - LOG.warn(f"BC no : in {elt} for {contact} in {fp}") - # return {} - # try going with what we have - break + LOG.warn(f"no : in {elt} for {contact} in {fp}") + return {} (key , val,) = elt.split(':', 1) if key == '': continue @@ -686,9 +580,9 @@ def oMainArgparser(_=None): default='127.0.0.1', help='proxy host') parser.add_argument('--proxy_port', '--proxy-port', default=9050, type=int, - help='proxy socks port') + help='proxy control port') parser.add_argument('--proxy_ctl', '--proxy-ctl', - default='/run/tor/control' if os.path.exists('/run/tor/control') else '9051', + default='/run/tor/control' if os.path.exists('/run/tor/control') else 9051, type=str, help='control socket - or port') @@ -705,14 +599,12 @@ def oMainArgparser(_=None): parser.add_argument('--bad_nodes', type=str, default=os.path.join(ETC_DIR, 'badnodes.yaml'), help="Yaml file of bad nodes that should also be excluded") - parser.add_argument('--bad_on', type=str, default='Empty,NoEmail,NotGood', + parser.add_argument('--bad_on', type=str, default='Empty,NotGood', help="comma sep list of conditions - Empty,NoEmail,NotGood") parser.add_argument('--bad_contacts', type=str, default=os.path.join(ETC_DIR, 'badcontacts.yaml'), help="Yaml file of bad contacts that bad FPs are using") - parser.add_argument('--saved_only', default=False, - action='store_true', - help="Just use the info in the last *.yaml files without querying the Tor controller") + parser.add_argument('--strict_nodes', type=str, default=0, choices=['0', '1'], help="Set StrictNodes: 1 is less anonymous but more secure, although some onion sites may be unreachable") @@ -723,20 +615,14 @@ def oMainArgparser(_=None): parser.add_argument('--log_level', type=int, default=20, help="10=debug 20=info 30=warn 40=error") parser.add_argument('--bad_sections', type=str, - default='BadExit', - help="sections of the badnodes.yaml to use, in addition to BadExit, comma separated") + default='MyBadExit', + help="sections of the badnodes.yaml to use, comma separated, '' BROKEN") parser.add_argument('--white_onions', type=str, default='', help="comma sep. list of onions to whitelist their introduction points - BROKEN") parser.add_argument('--torrc_output', type=str, default=os.path.join(ETC_DIR, 'torrc.new'), help="Write the torrc configuration to a file") - parser.add_argument('--hs_dir', type=str, - default='/var/lib/tor', - help="Parse the files name hostname below this dir to find Hidden Services to whitelist") - parser.add_argument('--notice_log', type=str, - default='', - help="Parse the notice log for relays and services") parser.add_argument('--relays_output', type=str, default=os.path.join(ETC_DIR, 'relays.json'), help="Write the download relays in json to a file") @@ -748,43 +634,40 @@ def oMainArgparser(_=None): return parser def vwrite_good_contacts(oargs): - global aGOOD_CONTACTS_DB + global aTRUST_DB good_contacts_tmp = oargs.good_contacts + '.tmp' with open(good_contacts_tmp, 'wt') as oFYaml: - yaml.dump(aGOOD_CONTACTS_DB, oFYaml) + yaml.dump(aTRUST_DB, oFYaml) oFYaml.close() if os.path.exists(oargs.good_contacts): bak = oargs.good_contacts +'.bak' os.rename(oargs.good_contacts, bak) os.rename(good_contacts_tmp, oargs.good_contacts) - LOG.info(f"Wrote {len(list(aGOOD_CONTACTS_DB.keys()))} good contact details to {oargs.good_contacts}") - bad_contacts_tmp = good_contacts_tmp.replace('.tmp', '.bad') - with open(bad_contacts_tmp, 'wt') as oFYaml: - yaml.dump(aBAD_CONTACTS_DB, oFYaml) - oFYaml.close() + LOG.info(f"Wrote {len(list(aTRUST_DB.keys()))} good contact details to {oargs.good_contacts}") -def vwrite_badnodes(oargs, aBAD_NODES, slen, stag): - if not aBAD_NODES: return - tmp = oargs.bad_nodes +'.tmp' - bak = oargs.bad_nodes +'.bak' - with open(tmp, 'wt') as oFYaml: - yaml.dump(aBAD_NODES, oFYaml) - LOG.info(f"Wrote {slen} to {stag} in {oargs.bad_nodes}") - oFYaml.close() - if os.path.exists(oargs.bad_nodes): - os.rename(oargs.bad_nodes, bak) - os.rename(tmp, oargs.bad_nodes) +def vwrite_badnodes(oargs, oBAD_NODES, slen): + if oargs.bad_nodes: + tmp = oargs.bad_nodes +'.tmp' + bak = oargs.bad_nodes +'.bak' + with open(tmp, 'wt') as oFYaml: + yaml.dump(oBAD_NODES, oFYaml) + LOG.info(f"Wrote {slen} to {oargs.bad_nodes}") + oFYaml.close() + if os.path.exists(oargs.bad_nodes): + os.rename(oargs.bad_nodes, bak) + os.rename(tmp, oargs.bad_nodes) -def vwrite_goodnodes(oargs, aGOOD_NODES, ilen): - tmp = oargs.good_nodes +'.tmp' - bak = oargs.good_nodes +'.bak' - with open(tmp, 'wt') as oFYaml: - yaml.dump(aGOOD_NODES, oFYaml) - LOG.info(f"Wrote {ilen} good relays to {oargs.good_nodes}") - oFYaml.close() - if os.path.exists(oargs.good_nodes): - os.rename(oargs.good_nodes, bak) - os.rename(tmp, oargs.good_nodes) +def vwrite_goodnodes(oargs, oGOOD_NODES, ilen): + if oargs.good_nodes: + tmp = oargs.good_nodes +'.tmp' + bak = oargs.good_nodes +'.bak' + with open(tmp, 'wt') as oFYaml: + yaml.dump(oGOOD_NODES, oFYaml) + LOG.info(f"Wrote {ilen} good relays to {oargs.good_nodes}") + oFYaml.close() + if os.path.exists(oargs.good_nodes): + os.rename(oargs.good_nodes, bak) + os.rename(tmp, oargs.good_nodes) def lget_onionoo_relays(oargs): import requests @@ -897,19 +780,18 @@ def vsetup_logging(log_level, logfile='', stream=sys.stdout): LOG.addHandler(oHandler) LOG.info(f"SSetting log_level to {log_level!s}") -def vwritefinale(oargs): - global lNOT_IN_RELAYS_DB - - if len(lNOT_IN_RELAYS_DB): - LOG.warn(f"{len(lNOT_IN_RELAYS_DB)} relays from stem were not in onionoo.torproject.org") +def vwritefinale(oargs, lNotInaRELAYS_DB): + if len(lNotInaRELAYS_DB): + LOG.warn(f"{len(lNotInaRELAYS_DB)} relays from stem were not in onionoo.torproject.org") LOG.info(f"For info on a FP, use: https://nusenu.github.io/OrNetStats/w/relay/.html") - LOG.info(f"For info on relays, try: https://onionoo.torproject.org/details") + LOG.info(f"For info on relays, use: https://onionoo.torproject.org/details") # https://onionoo.torproject.org/details + LOG.info(f"although it's often broken") def bProcessContact(b, texclude_set, aBadContacts, iFakeContact=0): - global aGOOD_CONTACTS_DB - global aGOOD_CONTACTS_FPS + global aTRUST_DB + global aTRUST_DB_INDEX sofar = '' fp = b['fp'] # need to skip urllib3.exceptions.MaxRetryError @@ -931,16 +813,15 @@ def bProcessContact(b, texclude_set, aBadContacts, iFakeContact=0): LOG.info(f"{fp} GOOD {b['url']} {sofar}") # add our contact info to the trustdb - aGOOD_CONTACTS_DB[fp] = b + aTRUST_DB[fp] = b for elt in b['fps']: - aGOOD_CONTACTS_FPS[elt] = b + aTRUST_DB_INDEX[elt] = b return True -def bCheckFp(relay, sofar, lConds, texclude_set): - global aGOOD_CONTACTS_DB - global aGOOD_CONTACTS_FPS - global lNOT_IN_RELAYS_DB +def bCheckFp(relay, sofar, lConds, texclude_set, lNotInaRELAYS_DB): + global aTRUST_DB + global aTRUST_DB_INDEX if not is_valid_fingerprint(relay.fingerprint): LOG.warn('Invalid Fingerprint: %s' % relay.fingerprint) @@ -949,17 +830,17 @@ def bCheckFp(relay, sofar, lConds, texclude_set): fp = relay.fingerprint if aRELAYS_DB and fp not in aRELAYS_DB.keys(): LOG.warn(f"{fp} not in aRELAYS_DB") - lNOT_IN_RELAYS_DB += [fp] + lNotInaRELAYS_DB += [fp] if not relay.exit_policy.is_exiting_allowed(): - if sEXCLUDE_EXIT_GROUP == sEXCLUDE_EXIT_GROUP: + if sEXCLUDE_EXIT_KEY == sEXCLUDE_EXIT_KEY: pass # LOG.debug(f"{fp} not an exit {sofar}") else: pass # LOG.warn(f"{fp} not an exit {sofar}") # return None # great contact had good fps and we are in them - if fp in aGOOD_CONTACTS_FPS.keys(): + if fp in aTRUST_DB_INDEX.keys(): # a cached entry return None @@ -975,8 +856,8 @@ def bCheckFp(relay, sofar, lConds, texclude_set): contact = sCleanEmail(relay.contact) # fail if the contact has no email - unreliable - if 'NoEmail' in lConds and relay.contact and \ - ('@' not in contact): + if ('NoEmail' in lConds and relay.contact and + ('@' not in contact and 'email:' not in contact)): LOG.info(f"{fp} skipping contact - NoEmail {contact} {sofar}") LOG.debug(f"{fp} {relay.contact} {sofar}") texclude_set.add(fp) @@ -1000,8 +881,8 @@ def bCheckFp(relay, sofar, lConds, texclude_set): return True def oMainPreamble(lArgs): - global aGOOD_CONTACTS_DB - global aGOOD_CONTACTS_FPS + global aTRUST_DB + global aTRUST_DB_INDEX parser = oMainArgparser() oargs = parser.parse_args(lArgs) @@ -1018,20 +899,20 @@ def oMainPreamble(lArgs): if sFile and os.path.exists(sFile): try: with open(sFile, 'rt') as oFd: - aGOOD_CONTACTS_DB = safe_load(oFd) - LOG.info(f"{len(aGOOD_CONTACTS_DB.keys())} trusted contacts from {sFile}") + aTRUST_DB = safe_load(oFd) + LOG.info(f"{len(aTRUST_DB.keys())} trusted contacts from {sFile}") # reverse lookup of fps to contacts # but... - for (k, v,) in aGOOD_CONTACTS_DB.items(): + for (k, v,) in aTRUST_DB.items(): if 'modified' not in v.keys(): v['modified'] = int(time.time()) - aGOOD_CONTACTS_FPS[k] = v - if 'fps' in aGOOD_CONTACTS_DB[k].keys(): - for fp in aGOOD_CONTACTS_DB[k]['fps']: - if fp in aGOOD_CONTACTS_FPS: + aTRUST_DB_INDEX[k] = v + if 'fps' in aTRUST_DB[k].keys(): + for fp in aTRUST_DB[k]['fps']: + if fp in aTRUST_DB_INDEX: continue - aGOOD_CONTACTS_FPS[fp] = v - LOG.info(f"{len(aGOOD_CONTACTS_FPS.keys())} good relays from {sFile}") + aTRUST_DB_INDEX[fp] = v + LOG.info(f"{len(aTRUST_DB_INDEX.keys())} good relays from {sFile}") except Exception as e: LOG.exception(f"Error reading YAML TrustDB {sFile} {e}") @@ -1054,9 +935,9 @@ def oStemController(oargs): # does it work dynamically? return 2 - elt = controller.get_conf(sEXCLUDE_EXIT_GROUP) + elt = controller.get_conf(sEXCLUDE_EXIT_KEY) if elt and elt != '{??}': - LOG.warn(f"{sEXCLUDE_EXIT_GROUP} is in use already") + LOG.warn(f"{sEXCLUDE_EXIT_KEY} is in use already") return controller @@ -1067,50 +948,22 @@ def tWhitelistSet(oargs, controller): LOG.info(f"lYamlGoodNodes {len(twhitelist_set)} EntryNodes from {oargs.good_nodes}") t = set() - if 'IntroductionPoints' in aGOOD_NODES[sGOOD_ROOT]['Relays'].keys(): - t = set(aGOOD_NODES[sGOOD_ROOT]['Relays']['IntroductionPoints']) - - if oargs.hs_dir and os.path.exists(oargs.hs_dir): - for (dirpath, dirnames, filenames,) in os.walk(oargs.hs_dir): - for f in filenames: - if f != 'hostname': continue - with open(os.path.join(dirpath, f), 'rt') as oFd: - son = oFd.read() - t.update(son) - LOG.info(f"Added {son} to the list for Introduction Points") - - if oargs.notice_log and os.path.exists(oargs.notice_log): - tmp = tempfile.mktemp() - i = os.system(f"grep 'Every introduction point for service' {oargs.notice_log} |sed -e 's/.* service //' -e 's/ is .*//'|sort -u |sed -e '/ /d' > {tmp}") - if i: - with open(tmp, 'rt') as oFd: - tnew = {elt.strip() for elt in oFd.readlines()} - t.update(tnew) - LOG.info(f"Whitelist {len(lnew)} services from {oargs.notice_log}") - os.remove(tmp) - + if sGOOD_ROOT in oGOOD_NODES and 'Relays' in oGOOD_NODES[sGOOD_ROOT] and \ + 'IntroductionPoints' in oGOOD_NODES[sGOOD_ROOT]['Relays'].keys(): + t = set(oGOOD_NODES[sGOOD_ROOT]['Relays']['IntroductionPoints']) + w = set() - if sGOOD_ROOT in aGOOD_NODES and 'Services' in aGOOD_NODES[sGOOD_ROOT].keys(): - w = set(aGOOD_NODES[sGOOD_ROOT]['Services']) + if sGOOD_ROOT in oGOOD_NODES and 'Services' in oGOOD_NODES[sGOOD_ROOT].keys(): + w = set(oGOOD_NODES[sGOOD_ROOT]['Services']) + twhitelist_set.update(w) if len(w) > 0: - LOG.info(f"Whitelist {len(w)} relays from {sGOOD_ROOT}/Services") + LOG.info(f"Whitelist {len(t)} relays from Services") - if oargs.notice_log and os.path.exists(oargs.notice_log): - tmp = tempfile.mktemp() - i = os.system(f"grep 'Wanted to contact directory mirror \$' /var/lib/tor/.SelekTOR/3xx/cache/9050/notice.log|sed -e 's/.* \$//' -e 's/[~ ].*//'|sort -u > {tmp}") - if i: - with open(tmp, 'rt') as oFd: - lnew = oFd.readlines() - w.update(set(lnew)) - LOG.info(f"Whitelist {len(lnew)} relays from {oargs.notice_log}") - os.remove(tmp) - twhitelist_set.update(w) - w = set() - if 'Onions' in aGOOD_NODES[sGOOD_ROOT].keys(): + if 'Onions' in oGOOD_NODES[sGOOD_ROOT].keys(): # Provides the descriptor for a hidden service. The **address** is the # '.onion' address of the hidden service - w = set(aGOOD_NODES[sGOOD_ROOT]['Onions']) + w = set(oGOOD_NODES[sGOOD_ROOT]['Onions']) if oargs.white_onions: w.update(oargs.white_onions.split(',')) if oargs.points_timeout > 0: @@ -1124,68 +977,63 @@ def tWhitelistSet(oargs, controller): def tExcludeSet(oargs): texclude_set = set() - sections = {'BadExit'} if oargs.bad_nodes and os.path.exists(oargs.bad_nodes): - if oargs.bad_sections: - sections.update(oargs.bad_sections.split(',')) - texclude_set = set(lYamlBadNodes(oargs.bad_nodes, - tWanted=sections, - section=sEXCLUDE_EXIT_GROUP)) - LOG.info(f"Preloaded {len(texclude_set)} bad fps") + if False and oargs.bad_sections: + # BROKEN + sections = oargs.bad_sections.split(',') + texclude_set = set(lYamlBadNodes(oargs.bad_nodes, + lWanted=sections, + section=sEXCLUDE_EXIT_KEY)) + LOG.info(f"Preloaded {len(texclude_set)} bad fps") return texclude_set # async def iMain(lArgs): - global aGOOD_CONTACTS_DB - global aGOOD_CONTACTS_FPS - global aBAD_CONTACTS_DB - global aBAD_NODES - global aGOOD_NODES + global aTRUST_DB + global aTRUST_DB_INDEX + global oBAD_NODES + global oGOOD_NODES global lKNOWN_NODNS global aRELAYS_DB global aRELAYS_DB_INDEX global tBAD_URLS - global lNOT_IN_RELAYS_DB - + oargs = oMainPreamble(lArgs) controller = oStemController(oargs) twhitelist_set = tWhitelistSet(oargs, controller) texclude_set = tExcludeSet(oargs) - ttrust_db_index = aGOOD_CONTACTS_FPS.keys() + ttrust_db_index = aTRUST_DB_INDEX.keys() + tdns_urls = set() iFakeContact = 0 iTotalContacts = 0 aBadContacts = {} - lNOT_IN_RELAYS_DB = [] + lNotInaRELAYS_DB = [] iR = 0 relays = controller.get_server_descriptors() lqueue = [] socksu = f"socks5://{oargs.proxy_host}:{oargs.proxy_port}" - if oargs.saved_only: - relays = [] for relay in relays: iR += 1 fp = relay.fingerprint = relay.fingerprint.upper() - sofar = f"G:{len(aGOOD_CONTACTS_DB.keys())} F:{iFakeContact} BF:{len(texclude_set)} GF:{len(ttrust_db_index)} TC:{iTotalContacts} #{iR}" + sofar = f"G:{len(aTRUST_DB.keys())} U:{len(tdns_urls)} F:{iFakeContact} BF:{len(texclude_set)} GF:{len(ttrust_db_index)} TC:{iTotalContacts} #{iR}" lConds = oargs.bad_on.split(',') - r = bCheckFp(relay, sofar, lConds, texclude_set) + r = bCheckFp(relay, sofar, lConds, texclude_set, lNotInaRELAYS_DB) if r is not True: continue # if it has a ciissversion in contact we count it in total iTotalContacts += 1 # only proceed if 'NotGood' not in lConds: - if 'NotGood' not in lConds: - continue + if 'NotGood' not in lConds: continue # fail if the contact does not have url: to pass a = aParseContact(relay.contact, fp) if not a: - LOG.warn(f"{fp} BC contact did not parse {sofar}") + LOG.warn(f"{fp} contact did not parse {sofar}") texclude_set.add(fp) - aBAD_CONTACTS_DB[fp] = a continue if 'url' in a and a['url']: @@ -1200,17 +1048,23 @@ def iMain(lArgs): # fail if the contact uses a domain we already know does not resolve if domain in lKNOWN_NODNS: # The fp is using a contact with a URL we know is bogus - LOG.info(f"{fp} BC skipping in lKNOWN_NODNS {a} {sofar}") + LOG.info(f"{fp} skipping in lKNOWN_NODNS {a} {sofar}") LOG.debug(f"{fp} {relay} {sofar}") texclude_set.add(fp) - aBAD_CONTACTS_DB[fp] = a continue # drop through - if 'proof' in a and a['proof'] in ['uri-rsa', 'dns-rsa']: + if 'dns-rsa' in relay.contact.lower(): + # skip if the contact uses a dns-rsa url we dont handle + target = f"{fp}.{domain}" + LOG.info(f"skipping 'dns-rsa' {target} {sofar}") + tdns_urls.add(target) + continue + + if 'proof:uri-rsa' in relay.contact.lower(): if domain in aDOMAIN_FPS.keys(): continue + a['fp'] = fp if httpx: - a['fp'] = fp lqueue.append(asyncio.create_task( aVerifyContact(a=a, fp=fp, @@ -1245,13 +1099,15 @@ def iMain(lArgs): LOG.info(f"Filtered {len(twhitelist_set)} whitelisted relays") texclude_set = texclude_set.difference(twhitelist_set) - LOG.info(f"{len(list(aGOOD_CONTACTS_DB.keys()))} good contacts out of {iTotalContacts}") + # accept the dns-rsa urls for now until we test them + texclude_set = texclude_set.difference(tdns_urls) + LOG.info(f"{len(list(aTRUST_DB.keys()))} good contacts out of {iTotalContacts}") if oargs.torrc_output and texclude_set: with open(oargs.torrc_output, 'wt') as oFTorrc: - oFTorrc.write(f"{sEXCLUDE_EXIT_GROUP} {','.join(texclude_set)}\n") - oFTorrc.write(f"{sINCLUDE_EXIT_KEY} {','.join(aGOOD_CONTACTS_FPS.keys())}\n") - oFTorrc.write(f"{sINCLUDE_GUARD_KEY} {','.join(aGOOD_NODES[sGOOD_ROOT]['EntryNodes'])}\n") + oFTorrc.write(f"{sEXCLUDE_EXIT_KEY} {','.join(texclude_set)}\n") + oFTorrc.write(f"{sINCLUDE_EXIT_KEY} {','.join(aTRUST_DB_INDEX.keys())}\n") + oFTorrc.write(f"{sINCLUDE_GUARD_KEY} {','.join(oGOOD_NODES[sGOOD_ROOT]['EntryNodes'])}\n") LOG.info(f"Wrote tor configuration to {oargs.torrc_output}") oFTorrc.close() @@ -1261,64 +1117,56 @@ def iMain(lArgs): yaml.dump(aBadContacts, oFYaml) oFYaml.close() - if oargs.good_contacts != '' and aGOOD_CONTACTS_DB: + if oargs.good_contacts != '' and aTRUST_DB: vwrite_good_contacts(oargs) - aBAD_NODES[oBAD_ROOT][sEXCLUDE_EXIT_GROUP]['BadExit'] = list(texclude_set) - aBAD_NODES[oBAD_ROOT][sEXCLUDE_DOMAINS] = lKNOWN_NODNS - if oargs.bad_nodes: - stag = sEXCLUDE_EXIT_GROUP + '/BadExit' - vwrite_badnodes(oargs, aBAD_NODES, str(len(texclude_set)), stag) + oBAD_NODES[oBAD_ROOT][sEXCLUDE_EXIT_KEY]['BadExit'] = list(texclude_set) + oBAD_NODES[oBAD_ROOT][sEXCLUDE_DOMAINS] = lKNOWN_NODNS + vwrite_badnodes(oargs, oBAD_NODES, str(len(texclude_set))) - aGOOD_NODES['GoodNodes']['Relays']['ExitNodes'] = list(aGOOD_CONTACTS_FPS.keys()) + oGOOD_NODES['GoodNodes']['Relays']['ExitNodes'] = list(aTRUST_DB_INDEX.keys()) # EntryNodes are readony - if oargs.good_nodes: - vwrite_goodnodes(oargs, aGOOD_NODES, len(aGOOD_CONTACTS_FPS.keys())) + vwrite_goodnodes(oargs, oGOOD_NODES, len(aTRUST_DB_INDEX.keys())) - vwritefinale(oargs) + vwritefinale(oargs, lNotInaRELAYS_DB) retval = 0 try: logging.getLogger('stem').setLevel(30) if texclude_set: try: - LOG.info(f"controller {sEXCLUDE_EXIT_GROUP} {len(texclude_set)} net bad relays") - controller.set_conf(sEXCLUDE_EXIT_GROUP, list(texclude_set)) + LOG.info(f"{sEXCLUDE_EXIT_KEY} {len(texclude_set)} net bad exit relays") + controller.set_conf(sEXCLUDE_EXIT_KEY, list(texclude_set)) except (Exception, stem.InvalidRequest, stem.SocketClosed,) as e: # noqa - LOG.error(f"Failed setting {sEXCLUDE_EXIT_GROUP} bad exit relays in Tor {e}") + LOG.error(f"Failed setting {sEXCLUDE_EXIT_KEY} bad exit relays in Tor {e}") LOG.debug(repr(texclude_set)) retval += 1 - if aGOOD_CONTACTS_FPS.keys(): - l = [elt for elt in aGOOD_CONTACTS_FPS.keys() if len (elt) == 40] + if aTRUST_DB_INDEX.keys(): + l = [elt for elt in aTRUST_DB_INDEX.keys() if len (elt) == 40] try: - LOG.info(f"controller {sINCLUDE_EXIT_KEY} {len(l)} good relays") + LOG.info(f"{sINCLUDE_EXIT_KEY} {len(l)} good relays") controller.set_conf(sINCLUDE_EXIT_KEY, l) except (Exception, stem.InvalidRequest, stem.SocketClosed) as e: # noqa LOG.error(f"Failed setting {sINCLUDE_EXIT_KEY} good exit nodes in Tor {e}") LOG.debug(repr(l)) retval += 1 - if 'EntryNodes' in aGOOD_NODES[sGOOD_ROOT].keys(): + if 'EntryNodes' in oGOOD_NODES[sGOOD_ROOT].keys(): try: - LOG.info(f"{sINCLUDE_GUARD_KEY} {len(aGOOD_NODES[sGOOD_ROOT]['EntryNodes'])} guard nodes") + LOG.info(f"{sINCLUDE_GUARD_KEY} {len(oGOOD_NODES[sGOOD_ROOT]['EntryNodes'])} guard nodes") # FixMe for now override StrictNodes it may be unusable otherwise controller.set_conf(sINCLUDE_GUARD_KEY, - aGOOD_NODES[sGOOD_ROOT]['EntryNodes']) + oGOOD_NODES[sGOOD_ROOT]['EntryNodes']) except (Exception, stem.InvalidRequest, stem.SocketClosed,) as e: # noqa LOG.error(f"Failed setting {sINCLUDE_GUARD_KEY} guard nodes in Tor {e}") - LOG.debug(repr(list(aGOOD_NODES[sGOOD_ROOT]['EntryNodes']))) + LOG.debug(repr(list(oGOOD_NODES[sGOOD_ROOT]['EntryNodes']))) retval += 1 cur = controller.get_conf('StrictNodes') if oargs.strict_nodes and int(cur) != oargs.strict_nodes: + LOG.info(f"OVERRIDING StrictNodes to {oargs.strict_nodes}") controller.set_conf('StrictNodes', oargs.strict_nodes) - cur = controller.get_conf('StrictNodes') - if int(cur) != oargs.strict_nodes: - LOG.warn(f"OVERRIDING StrictNodes NOT {oargs.strict_nodes}") - else: - LOG.info(f"OVERRODE StrictNodes to {oargs.strict_nodes}") - else: LOG.info(f"StrictNodes is set to {cur}") @@ -1340,6 +1188,7 @@ def iMain(lArgs): except Exception as e: LOG.warn(str(e)) + sys.stdout.write("dns-rsa domains:\n" +'\n'.join(tdns_urls) +'\n') return retval if __name__ == '__main__': diff --git a/exclude_badExits.txt b/exclude_badExits.txt deleted file mode 100644 index 8e0b180..0000000 --- a/exclude_badExits.txt +++ /dev/null @@ -1,76 +0,0 @@ -usage: exclude_badExits.py [-h] [--https_cafile HTTPS_CAFILE] - [--proxy_host PROXY_HOST] [--proxy_port PROXY_PORT] - [--proxy_ctl PROXY_CTL] [--torrc TORRC] - [--timeout TIMEOUT] [--good_nodes GOOD_NODES] - [--bad_nodes BAD_NODES] [--bad_on BAD_ON] - [--bad_contacts BAD_CONTACTS] [--saved_only] - [--strict_nodes {0,1}] [--wait_boot WAIT_BOOT] - [--points_timeout POINTS_TIMEOUT] - [--log_level LOG_LEVEL] - [--bad_sections BAD_SECTIONS] - [--white_onions WHITE_ONIONS] - [--torrc_output TORRC_OUTPUT] [--hs_dir HS_DIR] - [--notice_log NOTICE_LOG] - [--relays_output RELAYS_OUTPUT] - [--wellknown_output WELLKNOWN_OUTPUT] - [--good_contacts GOOD_CONTACTS] - -optional arguments: - -h, --help show this help message and exit - --https_cafile HTTPS_CAFILE - Certificate Authority file (in PEM) - --proxy_host PROXY_HOST, --proxy-host PROXY_HOST - proxy host - --proxy_port PROXY_PORT, --proxy-port PROXY_PORT - proxy control port - --proxy_ctl PROXY_CTL, --proxy-ctl PROXY_CTL - control socket - or port - --torrc TORRC torrc to check for suggestions - --timeout TIMEOUT proxy download connect timeout - --good_nodes GOOD_NODES - Yaml file of good info that should not be excluded - --bad_nodes BAD_NODES - Yaml file of bad nodes that should also be excluded - --bad_on BAD_ON comma sep list of conditions - Empty,NoEmail,NotGood - --bad_contacts BAD_CONTACTS - Yaml file of bad contacts that bad FPs are using - --saved_only Just use the info in the last *.yaml files without - querying the Tor controller - --strict_nodes {0,1} Set StrictNodes: 1 is less anonymous but more secure, - although some onion sites may be unreachable - --wait_boot WAIT_BOOT - Seconds to wait for Tor to booststrap - --points_timeout POINTS_TIMEOUT - Timeout for getting introduction points - must be long - >120sec. 0 means disabled looking for IPs - --log_level LOG_LEVEL - 10=debug 20=info 30=warn 40=error - --bad_sections BAD_SECTIONS - sections of the badnodes.yaml to use, in addition to - BadExit, comma separated - --white_onions WHITE_ONIONS - comma sep. list of onions to whitelist their - introduction points - BROKEN - --torrc_output TORRC_OUTPUT - Write the torrc configuration to a file - --hs_dir HS_DIR Parse the files name hostname below this dir to find - Hidden Services to whitelist - --notice_log NOTICE_LOG - Parse the notice log for relays and services - --relays_output RELAYS_OUTPUT - Write the download relays in json to a file - --wellknown_output WELLKNOWN_OUTPUT - Write the well-known files to a directory - --good_contacts GOOD_CONTACTS - Write the proof data of the included nodes to a YAML - file - -This extends nusenu's basic idea of using the stem library to dynamically -exclude nodes that are likely to be bad by putting them on the ExcludeNodes or -ExcludeExitNodes setting of a running Tor. * -https://github.com/nusenu/noContactInfo_Exit_Excluder * -https://github.com/TheSmashy/TorExitRelayExclude The basic idea is to exclude -Exit nodes that do not have ContactInfo: * -https://github.com/nusenu/ContactInfo-Information-Sharing-Specification That -can be extended to relays that do not have an email in the contact, or to -relays that do not have ContactInfo that is verified to include them. diff --git a/support_onions.py b/support_onions.py index abdc325..426c1fd 100644 --- a/support_onions.py +++ b/support_onions.py @@ -33,39 +33,18 @@ bHAVE_TORR = shutil.which('tor-resolve') # in the wild we'll keep a copy here so we can avoid restesting yKNOWN_NODNS = """ --- - - 0x0.is - - a9.wtf - - apt96.com - - axims.net - - backup.spekadyon.org - - dfri.se - - dotsrc.org - - dtf.contact - - ezyn.de - - for-privacy.net - - galtland.network + - a9.wtf - heraldonion.org - - interfesse.net - - kryptonit.org - linkspartei.org - - mkg20001.io - - nicdex.com - - nx42.de - pineapple.cx - privacylayer.xyz - - privacysvcs.net - prsv.ch - - sebastian-elisa-pfeifer.eu - thingtohide.nl - - tor-exit-2.aa78i2efsewr0neeknk.xyz - - tor-exit-3.aa78i2efsewr0neeknk.xyz + - tor-exit-2.aa78i2efsewr0neeknk.xyz + - tor-exit-3.aa78i2efsewr0neeknk.xyz - tor.dlecan.com - - tor.skankhunt42.pw - - transliberation.today - tuxli.org - - unzane.com - verification-for-nusenu.net - - www.defcon.org """ # - 0x0.is # - aklad5.com @@ -241,8 +220,7 @@ def lIntroductionPoints(controller=None, lOnions=[], itimeout=120, log_level=10) l += lp except (Empty, Timeout,) as e: # noqa LOG.warn(f"Timed out getting introduction points for {elt}") - except stem.DescriptorUnavailable as e: - LOG.error(e) + continue except Exception as e: LOG.exception(e) return l