Hi. I learned about this function when Dr Satija gave a talk at Columbia--thanks for showing us this method! I've been using Seurat to handle some large datasets, and per the official documentation, the BPCells package has been useful in that regard. However, it occasionally causes some unexpected behavior when functions expect a matrix object with "regular" behavior. I ran the following:
myeloid <- CloudAzimuth(myeloid)
and the output was as follows:
Running Pan-Human Azimuth on the cloud!
Error in max(data_check): invalid 'type' (S4) of argument
Traceback:
1. CloudAzimuth(myeloid)
It seems that the data check breaks when we make typical assumptions about its structure during the data checking (link to code:
|
if ((max(data_check) > 15) || isTRUE(all.equal(data_check,floor(data_check)))) { |
). Re-creating the
data_check object from my
myeloid object, I get the following:
> data_check
32941 x 5 IterableMatrix object with class RenameDims
Row names: 7SK.2, A1BG ... ZSWIM8-AS1
Col names: PT_Lee_LUNGT06_NSCLC_AAACGGGGTAGCGTAG, PT_Lee_LUNGT06_NSCLC_AACTCTTCATGCGCAC ... PT_Lee_LUNGT06_NSCLC_AAGGCAGGTTGCGCAC
Data type: double
Storage order: column major
Queued Operations:
1. Concatenate rows of 2 matrix objects with classes: RenameDims, MatrixSubset (threads=0)
2. Reset dimnames
For completeness, here are the classes of relevant objects:
> class(data_check)
'RenameDims'
> class(myeloid)
'Seurat'
> class(myeloid[['RNA']]$counts)
'RenameDims'
For the time being I'm going to try converting the counts matrix to a regular matrix object, but I'm not sure how scalable that is. What are the recommended practices when working with massive datasets?
Hi. I learned about this function when Dr Satija gave a talk at Columbia--thanks for showing us this method! I've been using Seurat to handle some large datasets, and per the official documentation, the BPCells package has been useful in that regard. However, it occasionally causes some unexpected behavior when functions expect a matrix object with "regular" behavior. I ran the following:
and the output was as follows:
It seems that the data check breaks when we make typical assumptions about its structure during the data checking (link to code:
AzimuthAPI/R/cloud.R
Line 21 in 5c3cc15
data_checkobject from mymyeloidobject, I get the following:For completeness, here are the classes of relevant objects:
For the time being I'm going to try converting the counts matrix to a regular matrix object, but I'm not sure how scalable that is. What are the recommended practices when working with massive datasets?