Just my blog
Blog about everything, mostly about tech stuff I made. Here is the list of stuff I'm using at my blog. Feel free to ask me about implementations.
- Mobaxterm SSH RDP FTP...
- Thunderbird Email client
- Filezilla FTP client/server
- Nirsoft Win utils
- Sysinternals Win utils
- Pi-Hole AD block by DNS
- NUT UPS manager
- Rpi MON Raspberry monitoring
- Free CAD 3D modelling
- Free Commander Far-like filemanager
- Bitwarden Password manager
- Django web framework
- celery multi-tasking
- celery-beat Celery + Django
- celery-results Celery + Django
- Pillow Python image lib
- wsgi mod Apache + Python
- requests best in WEB requests
- openpyxl make Excell docs
- p4python Perforce + Python
- paramiko SSH + Python
- pyvmomi ESXi Vcenter + Python
I'm using these libraries so you can ask me about them.
Python HTMLParser and Vkontakte randomizer
Finally I've finish my first "program" on Python. The task is to parse people's id from web page where reposter's id stores. Main problems were:
- web-page code is loading dynamically so there is no simple way to get ids from it, the best solution was - save section where id stores in .html file
- I wanted to catch id + nickname but list of pairs was not a good decision when random works
- I can't create a list which stores all found ids, it wiped every iteration
- I have some unsupported chars in nicknames and they'd broke iteration
- I've get a lot of junk while scan .html so I used regex to avoid them
- I can't add various ids in list without adding one id to list recursively - and guess what? Yes, it's broke the iteration
What I've learned: Here will be a huge list of different things for indexing for further search. [su_spoiler title="List of topics"] (let google parse it, so you can find this in future)
- How to open file in Python
- How to make global variables in Python
- How to parse html in Python
- How to sort variable with regex in Python
- re.findall in Python
- re.match in Python
- Construction 'for' in python
- How to make a replace for character in Python
- Construction 'if' in Python
- Construction 'else' in Python
- What is string in Python
- What is list in Python
- How to add something to list in Python
- list.append in Python
- list.extend in Python
- list.insert in Python
- How to export data to csv in Python
- How to get random in Python
- How to print in Python
- How to remove unprintable symbols in Python
- Convert string to list in Python
[/su_spoiler] [su_quote]I will show you my drafts, some of them, usually, can looks not clear and readable, but please do not blame me, I just start it from nothing, I didn't read any guide like 'Python for gentlemen' so my code can looks rude.[/su_quote] Here is my 'most last last try' where all topics are present:
with open('test.html', 'r', encoding='utf-8') as content_file: read_data = content_file.read() ''' 1. Replaced error with charset by replace character 2. Change the way how print was formatted 2.1 Added random - but still not used 3. Added CSV export tool ''' from html.parser import HTMLParser import re, sys, random, csv ''' Global variables here global vk_read global vk_name global men ''' class MyHTMLParser(HTMLParser): def handle_starttag(self, tag, attrs): vk_id = str(attrs) for line in vk_id: vk = re.findall('/\S+$', vk_id) vk_fnd = str(vk) if re.search('/\w+\'\)\]', vk_fnd): global vk_read vk_read = vk_fnd for ch in ['/', ')', '[', ']', '"', "'"]: if ch in vk_read: vk_read = vk_read.replace(ch, "") else: pass def handle_data(self, data): global vk_name vk_name = str(data) for line in vk_name: if re.match('\S+\s+\S+$', vk_name): for ch in ['\u0456', '\u0406']: if ch in vk_name: vk_name = vk_name.replace(ch, "?") if vk_name: if vk_read: global men men = '@'+vk_read+' '+vk_name print(men) # men = list('@'+vk_read+' '+vk_name) # men_list = men.split() # men_list.append(men_list) with open('vk_winners.csv', 'w', encoding='utf-8', newline='') as csvfile: write = csv.writer(csvfile, delimiter=' ') for _ in men: write.writerow([men]) break break else: print('ERROR no id found') else: print('ERROR no name found') else: break parser = MyHTMLParser() parser.feed(read_data)
So you can see what I`m trying to do and how. At the end of this post you'll see the last version worked. Lets dive in topics one by one:
How to open file in Python:
with open('test.html', 'r', encoding='utf-8') as content_file: read_data = content_file.read() content_file.closed parser = MyHTMLParser() parser.feed(read_data)
This construction will open file for read, but usually it can produce encoding errors, so I've add 'encoding=utf-8' to protect from them. Usually you should close the file, but I didn't use it in my task because it short and will finish job as soon as find all ids
How to make global variables in Python
global vk_read
Just add 'global' in body of script, before you give any necessary value to it.
How to parse html in Python
For Python 3.4 you'll use HTMLParser library. It can parse almost all tags from the raw html and you can do nothing else but just sort them. I have sort it using lists and regex. Do not forgot to read all the docs, for example, I've struggle a lot, because lost this 'The attrs argument is a list of (name, value) pairs' from doc.
How to sort variable with regex in Python
In my situation I have different way to sort it 're.findall('/\S+$', href)' + if re.search('/\w+\'\)\]', id_raw): and 'if re.match('\S+\s+\S+$', vk_name):'
- re.findall helps me to find all values from numbers or raw strings with tag, not sort them, just find it by given pattern '('/\S+$', href)' and keep it for further processing
-
Before: [('href', '/dimka_keystin')] [('class', 'like_row_cont inl_bl')] [('href', '/yana_lyubchenko'), ('class', 'like_img_cont')] [('width', '100'), ('height', '100'), ('src', 'https://pp.vk.me/c625428/v625428926/2f4c9/EGjgXLGiMkg.jpg')] [] [('href', '/yana_lyubchenko')] [('class', 'like_row_cont inl_bl')] [('href', '/id168233095'), ('class', 'like_img_cont')] [('width', '100'), ('height', '100'), ('src', 'https://pp.vk.me/c412728/v412728095/33b3/Q9scL5rbFWM.jpg')]
-
After: ["//pp.vk.me/c625730/v625730549/2b9fe/nG8MaWjdEeA.jpg')]"] [] ["/dimka_keystin')]"] [] [] ["//pp.vk.me/c625428/v625428926/2f4c9/EGjgXLGiMkg.jpg')]"] [] ["/yana_lyubchenko')]"]
-
- re.search helps to find each symbol from previous result and then make action on each of them
-
Found by pattern: ["/dimka_keystin')]"] ["/yana_lyubchenko')]"]
-
Then each not needed character replaced with null ["dimka_keystin')]"] ["dimka_keystin']"] "dimka_keystin']"] "dimka_keystin'" dimka_keystin' dimka_keystin ["yana_lyubchenko')]"] ["yana_lyubchenko']"] "yana_lyubchenko']"] "yana_lyubchenko'" yana_lyubchenko' yana_lyubchenko
-
- if re.match help me also to match only given pattern results. 'def handle_data(self, data):' has a lot of null strings, so I've sorted it and also remove all not unicode symbols like in above example
- Before:
Димон Димоныч ... Яна Любченко ...
- After:
Димон Димоныч Яна Любченко
- Before:
Construction 'for' in python
for ch in ['\u0456', '\u0406']: if ch in vk_name: vk_name = vk_name.replace(ch, "?")
Can help you to make loop till 'something' found in 'something2' or make 'action' for each 'line, string, list' from given variable until it ends.
Construction 'if' in Python
#THIS for ch in ['\u0456', '\u0406']: if ch in vk_name: vk_name = vk_name.replace(ch, "?") #OR THIS if re.search('/\w+\'\)\]', vk_fnd): #OR THIS for vk_id in vk_read: if vk_id not in vk_ids: vk_ids.append(vk_read)
Can help you to make some action if something is true, if something is found by pattern, if something is not in list. 'elif' - is just another variant of 'if', IF this 'if' cannot be found and pattern can be different.
Construction 'else' in Python
Make the same job as above but if something is not true, was not found or not present in list.
How to make a replace for character in Python
and
How to remove unprintable symbols in Python
Simple example:
Replace each in '["/dimka_keystin')]"]' where any of this ['/', ')', '[', ']', '"', "'"] found:
for ch in ['/', ')', '[', ']', '"', "'"]: if ch in vk_read: vk_read = vk_read.replace(ch, "")
Replace some not unicode chars from list of names:
for ch in ['\u0456', '\u0406']: if ch in vk_name: vk_name = vk_name.replace(ch, "?")
How to add something to list in Python
Different way I've found when working on it, but the best solution for my example is: list.append() this will add value to the end of list and it can collect all founded values as I need it in this task
for vk_id in vk_read: if vk_id not in vk_ids: vk_ids.append(vk_read)
list.insert(i, x) - can add value to the any needed location on list, but it can erase previous which stored there and also it work slowly. list.extend(L) - helps me to add list in list in lists but it can produce a lot of lists in one, this is not useful for my example, because python random can show something that I do not need to.
How to export data to csv in Python
In my example I've just declare variable with needed result, this variable stores the list of people ids and then it can be write in file. Here I get 'random_id' from id list 'vk_ids' but I can also export any data from any variable, just change 'random_id' to 'vk_ids' in write.writerow([random_id]) and I will get list of all founded ids. I have add brackets [] to declare it as list.
random_id = random.choice(vk_ids) with open('vk_winners.csv', 'w', encoding='utf-8') as csvfile: write = csv.writer(csvfile, delimiter=' ') write.writerow([random_id])
I did not close the file again, because it will close after script finished work.
How to get random in Python
As described above, just use the variable with list and add 'random.choice()'
for vk_id in vk_read: if vk_id not in vk_ids: vk_ids.append(vk_read) break random_id = random.choice(vk_ids)
Convert string to list in Python
In my situation I just need to declare variable with(as) empty list above the for construction and then add to it all strings from each iteration.
vk_ids = [] for vk_id in vk_read: if vk_id not in vk_ids: vk_ids.append(vk_read) break random_id = random.choice(vk_ids)
That's all for now, folks, I need go. Thanks for watching! This is how I finished it:
''' 1. Replaced error with charset by replace character 2. Change the way how print was formatted 2.1 Added random - used range from list of ids 3. Added CSV export tool for one man ''' ''' Global variables here global vk_read ''' from html.parser import HTMLParser import re, sys, random, csv with open('test.html', 'r', encoding='utf-8') as content_file: read_data = content_file.read() content_file.closed vk_ids = [] vk_men = [] from html.parser import HTMLParser import re, sys, random, csv class MyHTMLParser(HTMLParser): def handle_starttag(self, tag, attrs): global vk_read href = str(attrs) for line in href: id_tag = re.findall('/\S+$', href) id_raw = str(id_tag) if re.search('/\w+\'\)\]', id_raw): vk_read = id_raw else: break for ch in ['/', ')', '[', ']', '"', "'"]: if ch in vk_read: vk_read = vk_read.replace(ch, "") # http://stackoverflow.com/questions/30328193/python-add-string-to-a-list-loop for vk_id in vk_read: if vk_id not in vk_ids: vk_ids.append(vk_read) break random_id = random.choice(vk_ids) with open('vk_winners.csv', 'w', encoding='utf-8') as csvfile: write = csv.writer(csvfile, delimiter=' ') write.writerow([random_id]) # print(vk_ids) break parser = MyHTMLParser() parser.feed(read_data)