anyone know sqlite?

Discussion in 'Tech Heads' started by AgelessDrifter, Sep 16, 2017.

  1. AgelessDrifter

    AgelessDrifter TZT Neckbeard Lord

    Post Count:
    42,279
    I've got this assignment where I've got a huge dataset of arrest records from several decades ago, and I'm supposed to use it to make inferences about homelessness somehow. The arrest records are in a database, jail.db. There's also a list of homeless shelters in a separate csv file in another folder, but not in the database. I'm trying to get a count of how many arrest records in jail.db have addresses that match addresses in addresses.csv. All the data is on a remote server with no more storage on it, and is read-only. So I'm trying to find a way to pipe the addresses from addresses.csv into a select WHERE query, either in sql directly or by piping with bash. But my linux is rusty and I have no fucking clue about sql, really.

    Is there a way to do this? Or another inroad to the problem that makes more sense?
     
  2. Agrul

    Agrul TZT Neckbeard Lord

    Post Count:
    43,553
    it sounds like you're trying to compute an inner join on address

    why do you have to use pipes? why not just load both tables into whatever sql interface you're using and call inner join on them to get a new table organized as you want it? are they too big to store in RAM?

    also can you not download a copy of the data from the remote server to your home pc? why are there so many weird restrictions on how the data is accessed on the remote server?
     
  3. AgelessDrifter

    AgelessDrifter TZT Neckbeard Lord

    Post Count:
    42,279
    There may not be--I just don't know what I'm doing. I haven't tried sql on my own (windows) machine yet and I dunno how different mysql is from sql--since I have to submit the code at the end I hesitate to switch systems. I don't know what I don't know, ya know.

    I was able to make a database and import the addresses into it, then attach the arrest records, so basically what you suggested, I think.

    Since you're here, is there a simple way to do something like: for each value that appears in the addresses column, count the number of entries that have that identical value, then return a table with the values in one column and the counts in another?

    I'm not asking anyone to come up with the code for me, I'm just curious if this is a type of thing that even makes sense to think about doing with sql. I assume it is, but the "for" loop-ish part of it is outside of anything we were shown in examples and I have literally zero exposure to sql
     
  4. Agrul

    Agrul TZT Neckbeard Lord

    Post Count:
    43,553
    im not an sql expert by any means either, dont often have much reason to use it. i basically just spent some time studying it for an interview and that's about it. i think you're looking to do a combo of group by and count, though, like this:

    https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_groupby

    i think for this stuff mysql will be the same as most other sql implementations, but usually when you google sql commands they tell you which implementations the particular syntax is good for. do you know how the sql you're submitting is going to be tested? will a human prof actually look at it, will run they it through a specific sql variant and just mindlessly yell if it throws errors?

    i'm unsure, depends on how you did the attaching/importing

    i was suggesting you do an inner join (i.e. merge the two databases into one using addresses to link records), but i'm not sure how much sense that makes if addresses may appear in duplicate (which it sounds like they might if you're trying to count how often each address appears). i'm also not sure how the different kinds of outer joins would behave in that case. they mostly differ in how they handle records that don't appear in at least one of the two tables being merged. im not really sure how they would handle non-unique merge keys tho, woudl have to google that
     
  5. AgelessDrifter

    AgelessDrifter TZT Neckbeard Lord

    Post Count:
    42,279
    That group by example is exactly what I needed, thanks--that's a real life-saver.

    This is only week two of the program, and the first assignment from this class, so I don't know much at all of what to expect about how this stuff will be evaluated. I do doubt the code will get run, though.

    Yeah the join thing wouldn't work, I think--plausibly tons of recurring addresses (they're the addresses of homeless shelters, so the working theory is we can get a look at how many of the arrests were of homeless people by looking for ones who listed them as their addresses. Ideally there'd be a bunch for each one. In practice it turns out there are literally none--the only exception being " "--no address at all, which appears in a couple dozen thousand arrests).
     
  6. Agrul

    Agrul TZT Neckbeard Lord

    Post Count:
    43,553
    the sql variants dont differ dramatically; if the prof is just looking at your sql & not actually running it i wouldnt worry about the diff between mysql and whatever commercial variant the prof might be most familiar w/

    on googling it i think the (inner/outer/left/right) join would have worked, they seem to just take all relevant pairs when duplicates occur, returning multiple records rather than collapsing them:

    https://stackoverflow.com/questions/10939090/inner-join-returning-large-numbers-of-duplicates

    weird that the blank is the only recurring address. i wonder if holy or red or one of the other cop-like ppl might have some insight into why
     
  7. AgelessDrifter

    AgelessDrifter TZT Neckbeard Lord

    Post Count:
    42,279
    Well the only recurring one from the list of addresses of homeless shelters was blank. Actually by sorting the grouped list and looking at the highest rate of recurrences I discovered that there was a distinct "HOMELESS" value that recurs a few thousand times, as well as "confidential" (?). Then there are a lot of random ones that recur hundreds or thousands of times but don't seem to be shelters. How odd
     
  8. Jackpanel

    Jackpanel TZT Abuser

    Post Count:
    6,496
    Relying on exact matches of manually typed addresses will give horrible results. There's way too much variation in how the same address could be entered:

    123 Main Street
    123 main st.
    123 main st
    123 main st unit 789
    123-789 main str N

    If this is a simple intro SQL exercise, a simple join/group by/count is likely all they're looking for. If you were legit trying to analyze this though, you would run some sort of cleanup and parsing of the addresses - extract numbers into a separate column, remove spaces and punctuation, pattern replace common abbreviations (st. => street), etc
     
    Solayce likes this.
  9. Red

    Red TZT Neckbeard

    Post Count:
    12,585
    And jackpanel correctly guessed where all the homeless are arrested in our county
     
    Agrul likes this.
  10. Solayce

    Solayce Would you like some making **** BERSERKER!!! Staff Member

    Post Count:
    21,626
    :evil:
     
  11. Solayce

    Solayce Would you like some making **** BERSERKER!!! Staff Member

    Post Count:
    21,626
    glad agrul was here to help. I didn't make much progress beyond my first couple of weeks/assignments in database class - probably because Katrina had just happened. I didn't have a lot of time to study and could wrap my head around the INNER/OUTER/LEFT/RIGHT JOINS.
     
  12. Havlen

    Havlen TZT Abuser

    Post Count:
    9,783
    Didn't see this post as I don't venture out of General that much besides the dev forum. For future reference, there's a decent SQLite manager that you can download as an add on to Firefox. It's the main reason I have Firefox on my computer. Let's you browse and create objects (tables, views, etc.), run SQL commands, import delimited files, etc. All the basic stuff. Isn't the greatest manager in the world, but the best and easiest I've found for SQLite.
     
  13. Utumno

    Utumno Administrator Staff Member

    Post Count:
    36,127
    ur mom knows sqlite
     
  14. Klerick

    Klerick I have loved some ladies, I have loved Jim Beam...

    Post Count:
    11,929
    Pretty sure she's sqheavy
     
  15. Sifter

    Sifter TZT Addict

    Post Count:
    2,741
    I kinda miss SQL sometimes. NoSQL is a lot more work + thinking about usecases.