Skip to content

Unable to parse data from Mega system #69

@mbenson182

Description

@mbenson182

I realize that this is my second issue request in a week, for which I apologize, but the problem I'm having now is of much higher importance to me than the last one, as I'm primarily focused on just being able to replicate the parsing and rectification methods on my own data sets. I've been able to get the read() and correct() functions working on the test data, which is about as much as I need (at least for now).

However, I've been trying to use these functions on some data my group has collected, and have been unable to get it parsed out. It seems like in the calls to pyread or pyread_single when trying to parse the scans (in getmetadat() and _get_scans(), respectively).

This is the output when I try to run PyHum.read():

Input file is Rec00003.DAT
Son files are in Rec00003/
cs2cs arguments are epsg:26949
Draft: 0.3
Celerity of sound: 1450.0 m/s
Transducer length is 0.108 m
Only 1 chunk will be produced
Data is from the 2 series
Checking the epsg code you have chosen for compatibility with Basemap ...
... epsg code compatible
WARNING: Because files have to be read in byte by byte,
this could take a very long time ...
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 34.1s
[Parallel(n_jobs=2)]: Done 2 out of 4 | elapsed: 34.1s remaining: 34.1s
[Parallel(n_jobs=2)]: Done 4 out of 4 | elapsed: 34.1s remaining: 0.0s
something went wrong with the parallelised version of pyread ...
Traceback (most recent call last):
File "PyHumRead.py", line 78, in
reader()
File "PyHumRead.py", line 18, in reader
ph.read(humfile,sonpath,'epsg:26949',c,draft,0,t,0,0,model,0,0,'100m')
File "/home/user/miniconda2/envs/pyhum/lib/python2.7/site-packages/PyHum/_pyhum_read.py", line 427, in read
metadat = data.getmetadata()
File "PyHum/pyread.pyx", line 532, in PyHum.pyread.pyread.getmetadata
File "PyHum/pyread.pyx", line 538, in PyHum.pyread.pyread.getmetadata
TypeError: 'NoneType' object is not subscriptable

I don't particularly mind that the Parallel process works, but the interesting thing is that it seems to execute correctly, and then crash as it tries to finish up the process, which is interesting. Anyway, I wrote up a script to try to run the Parallel parsing method, but without actually calling Parallel (so a single-threaded way of calling the same code, since apparently the code executed by the except: block is different than that in the try: block). The code is:

import PyHum as ph
import glob, sys
import os
import pdb

import PyHum.utils as humutils
import PyHum.pyread_single as pyread_single

def reader():
	humfile = "Rec00003.DAT'"
	sonpath = '"Rec00003/"
	c = 1450.0;
	draft = 0.3;
	t = 0.108;
	model = 2; #2


	# ph.read(humfile,sonpath,'epsg:26949',c,draft,0,t,0,0,model,0,0,'100m')

	# get the SON files from this directory
	sonfiles = glob.glob(sonpath+'*.SON')
	if not sonfiles:
		sonfiles = glob.glob(os.getcwd()+os.sep+sonpath+'*.SON')

	base = humfile.split('.DAT') # get base of file name for output
	base = base[0].split(os.sep)[-1]

	# remove underscores, negatives and spaces from basename
	base = humutils.strip_base(base)

	print("WARNING: Because files have to be read in byte by byte,")
	print("this could take a very long time ...")

	# Single-threaded version of Parallel call
	X = []; Y = []; A = []; B = [];
	for k in range(len(sonfiles)):
		X[k], Y[k], A[k], B[k] = getscans(sonfiles[k], humfile, c, model, "epsg:26949")




def getscans(sonfile, humfile, c, model, cs2cs_args):

   data = pyread_single.pyread(sonfile, humfile, c, model, cs2cs_args)

   a, b = data.getscan()

   if b == 'sidescan_port':
      dat = data.gethumdat()
      metadat = data.getmetadata()
   else:
      dat = None
      metadat = None

#    return a, b, dat, metadat


if __name__ == '__main__':
	reader()

The output when I run this code block is:

WARNING: Because files have to be read in byte by byte,
this could take a very long time ...
Traceback (most recent call last):
File "PyHumRead.py", line 77, in
reader()
File "PyHumRead.py", line 37, in reader
X[k], Y[k], A[k], B[k] = getscans(sonfiles[k], humfile, c, model, "epsg:26949")
File "PyHumRead.py", line 64, in getscans
a, b = data.getscan()
File "PyHum/pyread_single.pyx", line 473, in PyHum.pyread_single.pyread.getscan
File "PyHum/pyread_single.pyx", line 502, in PyHum.pyread_single.pyread.getscan
File "PyHum/pyread_single.pyx", line 458, in PyHum.pyread_single.pyread._get_scans
MemoryError

There's definitely a problem going on in the parsing somewhere, but I'm not sure how to tackle figuring it out, as the problems seem to be happening in the Cython files in private data types which I can't figure out how to access.

Any help would be greatly appreciated! I'd attach the data I'm working off of, but the zipped file is about 130 MB; let me know if there's a good way to get it to you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions