halp w/curl command

Discussion in 'Tech Heads' started by Utumno, Jan 2, 2019.

  1. Utumno

    Utumno Administrator Staff Member

    Post Count:
    41,193
  2. Utumno

    Utumno Administrator Staff Member

    Post Count:
    41,193
    Alternately if there's some pythony way to do this easily I could try that as well.

    If it's rly complicated and a pain in the ass either way then just tell me I don't lack the basics to be able to do this and need to go git gud first.
     
  3. AgelessDrifter

    AgelessDrifter TZT Neckbeard Lord

    Post Count:
    43,909
    This is something I really should know how to do by now but don’t

    I hear Puppeteer is really good for scraping sites with stubborn apis
     
  4. Chemosh

    Chemosh TZT Addict

    Post Count:
    4,297
    Hit be up tomorrow on fb and I'll get it for you. I'm pretty good with curl
     
  5. Chemosh

    Chemosh TZT Addict

    Post Count:
    4,297
    Also, tip if you don't know. Go into your developer tools on the page and open network tab. Clear the data in network, select persist connections then make post request via webpage. Once you see the query, right click in dev tools and copy as curl (bash)
     
  6. Agrul

    Agrul TZT Neckbeard Lord

    Post Count:
    45,824
  7. Jackpanel

    Jackpanel TZT Abuser

    Post Count:
    6,832
    As someone who has spent way too many hours of his life battling website scrapers, I hope you all get Lyme disease.
     
  8. Utumno

    Utumno Administrator Staff Member

    Post Count:
    41,193
    :smitten:
     
  9. Agrul

    Agrul TZT Neckbeard Lord

    Post Count:
    45,824
    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.keys import Keys
    link = "http://shfinmatesearch.solanocounty.com/ListOfInmates.aspx"
    driver = webdriver.PhantomJS()
    driver.set_window_size(1120, 550)
    wait = WebDriverWait(driver, 10)
    driver.get(link)

    textEntry = driver.find_element_by_id('tx_Search')
    textEntry.send_keys('rock')
    textEntry.send_keys(Keys.ENTER)
    driver.save_screenshot('screenshot.png')


    saves screenshot.png that looks like:

    screenshot.jpg
     
  10. Utumno

    Utumno Administrator Staff Member

    Post Count:
    41,193
    Thanks, I think this works. I was already using developer tools in chrome to view individual page elements, but I didn't think to use network tab. One slight change from your instructions (I think) - there is no "persist connections" checkbox, though there is a "Preserve log" checkbox, which may do the same thing.

    That's super-handy that Chrome will just formulate the curl command for you. It feels like cheating.

    Chemosh has now taught me something new and I am shamed + grateful.
     
  11. Utumno

    Utumno Administrator Staff Member

    Post Count:
    41,193
    Oh look at Agrul getting all fancy using selenium and saving out to a png.
     
  12. Utumno

    Utumno Administrator Staff Member

    Post Count:
    41,193
    thx pals for helping me track my misguided brother + reminding Jackpanel why we can't have nice things.
     
  13. Utumno

    Utumno Administrator Staff Member

    Post Count:
    41,193
    I guess the next interesting thing to do (aside from just using a curl command repeatedly) would be to send an email to myself whenever this triggers, but save state so that it doesn't start triggering every day/hour or whatever interval I scrape at, then maybe send me another email when his name drops off the list and reset things so it's prepped for next time he's in the slammer.
     
  14. Agrul

    Agrul TZT Neckbeard Lord

    Post Count:
    45,824
    can we talk about the fact that your brother's prison houses the man, the legend

    ROCKY LEE MUSIC
     
  15. Utumno

    Utumno Administrator Staff Member

    Post Count:
    41,193
    Rocky aint never done nothing wrong. He's just in there because he ain't listnen to the MAN
     
  16. Chemosh

    Chemosh TZT Addict

    Post Count:
    4,297
    My thoughts on this.

    Use aws lambda with cloudwatch events triggered daily. Use node.js or python. Have it query the jail site, see if something had changed (md5sum, release date etc...). If so, ping something you leave in a S3 bucket to validate this and if changed, send email with ses and update s3 object to reflect update
     
  17. Utumno

    Utumno Administrator Staff Member

    Post Count:
    41,193
    those things cost money tho right? i mean even if it's a trivial amount, i could just as well script something on my home box to do the scraping. i don't need to give bezos any more of my cash than i already do (which is most of it lol)
     
  18. Chemosh

    Chemosh TZT Addict

    Post Count:
    4,297
    Lambda, s3 and ses all have free tiers. You'd look at less than 1c/month
     
  19. Agrul

    Agrul TZT Neckbeard Lord

    Post Count:
    45,824
    i use s3 for all my long-term storage and it's cheap as fuck, most months they charge me $0.01 - $0.10

    this month they charged me $2 ur prices are out of control chemosh
     
  20. Chemosh

    Chemosh TZT Addict

    Post Count:
    4,297
    Use better storage classes